MENTALHEALTH.INFOLABMED.COM - A groundbreaking medical Large Language Model (LLM) has demonstrated remarkable accuracy in identifying individuals diagnosed with major depressive disorder (MDD).

Specifically, this innovative AI achieved over 91 percent accuracy in detecting MDD among female participants.

The diagnosis was made after analyzing just a short WhatsApp audio recording where participants described their week.

These significant findings were recently published on January 21, 2026, in the open-access journal PLOS Mental Health.

The study was spearheaded by Victor H. O. Otani and his colleagues from Santa Casa de São Paulo School of Medical Sciences and Infinity Doctors Inc., Brazil.

Major depressive disorder is a pervasive global mental health challenge, impacting over 280 million people worldwide.

Early detection of this condition is crucial for facilitating timely and effective treatment.

In this research, Otani and his team harnessed advanced machine learning models to differentiate between individuals with and without MDD based on their WhatsApp voice messages.

Methodology: Training and Testing the AI

The researchers utilized two distinct datasets for this comprehensive study.

One dataset was dedicated to training their LLMs, which incorporated seven different sub-models.

The second dataset was then employed to rigorously test the performance of these trained LLMs.

The training dataset comprised 86 participants.

This group included 37 women and 8 men who were clinically diagnosed outpatients with major depressive disorder.

Additionally, a control group of 41 volunteers, consisting of 30 women and 11 men, had no depression diagnoses.

For the training phase, outpatient speech data was sourced from WhatsApp audio recordings sent to their doctors' offices while they were symptomatic.

Control group participants, in contrast, shared routine WhatsApp audio voice messages of their choosing.

The dataset used to test the trained models involved 74 participants.

This test group included 33 outpatients, specifically 17 women and 16 men, diagnosed with MDD.

The control portion of the test group comprised 41 individuals, made up of 21 women and 20 men, who had no depression diagnoses.

All participants provided informed consent and underwent screening to rule out potential confounding factors, such as other medical issues.

For the test dataset, the speech data for both the outpatient and control groups was standardized.

It included recorded WhatsApp messages counting from one to ten, as well as audio messages describing their past week.

Crucially, all audio messages across both datasets were from native Brazilian Portuguese speakers.

Key Findings: A Closer Look at Accuracy

The LLMs exhibited superior accuracy when classifying women as depressed versus not-depressed compared to men.

This higher performance was especially evident when the models analyzed data from participants describing their week.

The highest-performing model achieved an impressive accuracy rate of 91.9 percent for women.

Conversely, the same highest-performing model showed a 75 percent accuracy when classifying male participants based on the "describe your week" audio.

This disparity could potentially be attributed to a higher number of women participants in the model's training dataset, alongside inherent differences in speech patterns between men and women.

When given the "count to 10" data, the LLMs demonstrated more similar performance between genders.

The highest-performing model registered 82 percent accuracy for women and 78 percent accuracy for men in this specific task.

Future Outlook: A New Era for Mental Health Screening

The researchers are optimistic that continued refinement of their models will pave the way for a low-cost and practical method to screen individuals for depression.

Beyond screening, these models hold promise for various other potential clinical and research applications.

Senior author Lucas Marques stated, "Our study shows that subtle acoustic patterns in spontaneous WhatsApp voice messages can help identify depressive profiles with surprising accuracy using machine learning."

He added, "This opens a promising path for low-burden, real-world digital screening tools that respect people's daily communication habits."

This innovation marks a significant step towards integrating digital communication into accessible mental health diagnostics.