How A Chip Is Helping A Paralyzed Woman Talk Through An Avatar


In a groundbreaking achievement, a woman who had been paralyzed for 18 years has regained her ability to speak. This remarkable feat was accomplished by intercepting her brain signals and transforming them into a talking avatar, complete with authentic facial expressions and sound samples from her original voice, marking a historic milestone.

Ann, a 48-year-old who had been paralyzed due to a brainstem stroke she suffered at the age of 30, was the beneficiary of this groundbreaking technology. Researchers at the University of California achieved this remarkable breakthrough by implanting a thin, paper-like rectangle consisting of 253 electrodes directly onto the surface of her brain, specifically targeting the area crucial for speech. They then harnessed the power of artificial intelligence to develop a cutting-edge brain-computer interface (BCI).

These sophisticated BCI devices intercepted Ann’s brain signals, effectively enabling her to “speak” through them. These signals were transmitted via a cable connected to a port affixed to her head and then routed to a bank of powerful computers. At an impressive rate of 80 words per minute, the computers translated these signals into text. Simultaneously, an audio recording of Ann’s voice from her wedding day, which predated her stroke, was utilized to recreate her voice. This reconstructed voice was subsequently integrated into an on-screen avatar, which not only used her voice but also exhibited facial expressions to convey her emotions.

This pioneering achievement, spearheaded by the team at the University of California San Francisco, represents a historic breakthrough as it is the first instance where both speech and facial expressions have been synthesized directly from brain signals.

“Our goal is to restore a full, embodied way of communicating, which is really the most natural way for us to talk with others,” Dr. Edward Chang said. He is the chair of neurological surgery at UCSF. “These advancements bring us much closer to making this a real solution for patients.”

Ann spent weeks collaborating with the team, diligently instructing the system’s artificial intelligence algorithms to discern her unique brain signals related to speech.

This extensive effort entailed repeatedly uttering various phrases from a conversational vocabulary of 1,024 words until the computer could identify the distinctive brain activity patterns linked to each sound.

Rather than teaching the AI to recognize complete words, the researchers devised a method to decode words based on their constituent phonemes. For instance, the word “Hello” could be broken down into four phonemes: “HH,” “AH,” “L,” and “OW.”

By adopting this approach, the computer only needed to grasp 39 phonemes to decode any English word, significantly enhancing the system’s accuracy and tripling its processing speed.

“The accuracy, speed, and vocabulary are crucial,” said Sean Metzger. He was the one who developed the text decoder in the joint Bioengineering Program at UC Berkeley and UCSF. “It’s what gives a user the potential, in time, to communicate almost as fast as we do, and to have much more naturalistic and normal conversations.”

Utilizing a tailored machine learning procedure that enabled the company’s software to synchronize with the signals emanating from Ann’s brain, the computer avatar successfully replicated Ann’s physical actions. This encompassed opening and closing the jaw, manipulating the lips, and orchestrating tongue movements, in addition to mirroring various facial expressions such as happiness, sadness, and surprise.

Currently, the team is actively developing a wireless iteration of this technology, eliminating the need for users to remain tethered to computers.

The latest study, featured in the journal Nature, extends the prior research conducted by Dr. Chang’s team. Previously, they had translated brain signals into text for an individual who had suffered a brainstem stroke several years ago. Now, their breakthrough allows them to decode these signals into the nuances of speech and the dynamic facial animations that accompany conversational interactions.