Oxford University Press's
Academic Insights for the Thinking World

Speech, AI, and the future of neurology

Imagine what your life would be like if you did not know where you are or who you are with, and a young man told you, “We’re home and I’m your son.” Now imagine how you would feel if your body became still when you want to walk or shaky when you try to keep still. Do it. Take a moment and think about it.

Those who do not need to imagine are the 55 million people living with Alzheimer’s and the 10 million living with Parkinson’s, respectively, as they experience similar challenges every day. While these figures raise concern, future projections are alarming: by 2050, the number of cases is expected to double in high-income countries and triple in low/middle-income countries. Things are particularly bleak in the latter, as they account for 60% of the cases but less than 25% of global investment in research, prevention, diagnosis, and treatment.

With growing patient-per-clinic ratios and soaring inequities across the globe, how will we detect these diseases early and massively enough for timely intervention? What solutions could balance the scales of brain health worldwide? An unsuspected answer involves combining natural speech and artificial intelligence. Yes, this sounds like another flight of imagination, but it all rests on solid science.

Note that these diseases are incurable. Rapid and mass detection is our best alternative; and this is precisely where the need for innovation emerges.

Tracking diseases

Alzheimer’s and Parkinson’s are neurodegenerative disorders, characterized by progressive atrophy of distinct brain regions. Alzheimer’s usually begins with neuron degeneration in the hippocampus and temporal lobe, affecting memory and several other abilities. In Parkinson’s, neuronal degradation begins in the basal ganglia, leading to motor and cognitive difficulties. Yet, this is just the tip of the iceberg. For patients, these diseases are disabling and often fatal. For families, they undermine emotional stability, financial solvency, and quality of life. For governments, they challenge health systems’ infrastructure and finances. Thus, these conditions project from the brain onto society, tracing a devastating trajectory.

That is why timely detection is crucial. Early diagnosis can mitigate the impact of symptoms, reduce their emotional burden on patients and caregivers, increase time to plan neuroprotective habits, and reduce costs by favoring routine over emergency care. Note that these diseases are incurable. Rapid and mass detection is our best alternative; and this is precisely where the need for innovation emerges.

Today, diagnosis rests on interviews with specialists, extensive paper-and-pencil tests, and, when conditions allow, brain MRI studies and biomarker assessments. These procedures are invaluable, but imperfect. Many countries lack enough qualified personnel and appropriate technology (and when these resources exist, their costs can be prohibitive). Furthermore, outcomes depend on the judgments of examiners, who vary in training and experience. Moreover, assessments are usually stressful and appointments take weeks or months. Worse yet, these limitations are exacerbated as patient numbers increase and socioeconomic disparities between countries deepen. An urgent need thus arises for new affordable, user-friendly, scalable, and immediate approaches.

Red flags in speech

This is where digital speech biomarkers come into play. Suppose that Tom, who is pushing 70, has been showing signs of cognitive decline and you suspect he might be suffering from Alzheimer’s. What if you asked him to recount a memory and an app detected traces of the disease in his speech? This non-invasive, low-cost approach offers real-time results without the need to visit a clinic, sparking great enthusiasm. Yet, how exactly does it work?

The key is that when we speak, we engage multiple brain regions that are affected by these diseases. Some, such as the hippocampus and the temporal lobe, are involved in accessing words as discourse unfolds; others, like the basal ganglia, coordinate the physical movements during speech production. So, if such regions were atrophied, one would expect alterations in the types of words used, their articulation, or other relevant aspects. By testing specific linguistic dimensions, we can uncover the integrity or dysfunction of those brain areas.

The first step is to record the natural speech of individuals with and without a given disease (the more, the better). Subsequently, complex algorithms quantify multiple aspects of the recording (say, speech rhythm) and its transcription (say, word properties). These metrics are used to train computational models that learn the typical speech characteristics of diagnosed individuals and healthy ones. Once the model is trained, it is presented with acoustic and linguistic measures from Tom, and, essentially, queried with this question: “Model, based on what you’ve learned, does Tom have the disease or not?”

In a recent study, our team identified Alzheimer’s disease with nearly 90% success via word property analysis. The model learned that patients, compared to their healthy peers, use words with higher frequency (‘doctor’ rather than ‘physician’), lower specificity (‘dog’ instead of ‘poodle’), and more common sound sequences (like ‘bat’, which resembles ‘cat’, ‘fat’, ‘mat’, ‘rat’, ‘bet’, ‘bit’, ‘bought’, ‘boot’, ‘bad’, ‘bag’, and ‘ban’; as opposed to ‘giraffe’, whose sound sequence is quite unique). Indeed, these lexical properties predicted the patients’ level of cognitive decline and brain atrophy. The reason is quite simple: word selection is a central function of semantic memory, which becomes impaired since the onset of temporo-hippocampal atrophy in Alzheimer’s. When navigating semantic memory, people with the disease prioritize the most accessible parts of their vocabulary, consisting of frequent, unspecific, and common-sounds words. And the more severe their disorder is, the simpler the words they favor.

In another study, we detected Parkinson’s disease with over 90% accuracy by measuring motor aspects of speech. We found that patients, compared to healthy individuals, leave longer pauses between words and produce less recognizable sounds. These patterns even differentiated between disease variants. Once again, the finding is clear. Speech production requires coordinating movements of the tongue, lips, and vocal cords, among other organs. Since basal ganglia atrophy affects motor skills, these actions in people with Parkinson’s prove slow, shaky, and imprecise. The audio signal carries traces of these alterations.

The breakthroughs do not stop there. These methods can anticipate who will develop specific conditions in the future. Some studies also suggest that they outperform standard tests in estimating disease severity and discriminating between syndromes. The approach has been validated with data acquired in hospitals and over the phone, incorporated in clinical trials, and harnessed by user-friendly apps. These are critical milestones towards rethinking clinical assessments.

The breakthroughs do not stop there. These methods can anticipate who will develop specific conditions in the future.

A story in the making

For all its promise, this story is only beginning. The approach requires more validation, especially in large groups of patients. New studies should focus on vulnerable populations to balance the abundant data coming from high-income countries. More generally, a digital medical culture must be cultivated for clinicians to incorporate computational tools. Of course, these milestones demand concerted efforts of scientists, medical professionals, patients, family members, companies, and policymakers. None of this is easy or immediate. The path from science to clinical practice and public policy is long, crooked, and uphill.

Fortunately, this is not an isolated endeavor. Various teams are working on other digital tools for disease detection, including eye-tracking devices, motion sensors, and gamified cognitive tests. Speech analysis is part of a vast movement pushing for clinical equity through technological innovations.

To conclude, imagine that we can detect these diseases before Tom shows symptoms of decline. Imagine doing so by reducing the social gaps among world nations. And imagine a future where all this needs no longer be imagined. If such a day ever arrives, it will be through disruptions like this.

Featured image by TungArt7 via Pixabay.

Recent Comments

  1. Margaret Ellis

    Very interesting work which may also apply to the detection of brain tumours. I was aware a friend was having difficulty formulating sentences 2 years before her tumour was diagnosed. I wish I had spoken up.

Comments are closed.