Editor's Pick: AI-based clinical support in the real world

“AI-based Clinical Decision Support for Primary Care: A Real-World Study” (pre-print)

Robert Korom, Sarah Kiptinness,  Najib Adan, Kassim Said, Catherine Ithuli, Oliver Rotich, Boniface Kimani, Irene King’ori, Stellah Kamau, Elizabeth Atemba, Muna Aden, Preston Bowman, Michael Sharman, Rebecca Soskin Hicks, Rebecca Distler, Johannes Heidecke, Rahul K. Arora, and Karan Singhal
aRxiv
(Note: This is a pre-print that has not completed the peer review process.)
July 22, 2025

Read the paper

What's the point?AI-based clinical support

This study is one of the largest real-world tests of an LLM in a primary care setting to date. Researchers studied nearly 40K patient visits across 15 clinics in Nairobi, Kenya. Among randomized clinicians, one group used an AI Consult tool (an LLM system built into their electronic medical records that functions as a clinical decision support system, or CDSS) from funder OpenAI, while the other provided standard care. 

The AI tool ran quietly in the background, flagging potential errors in patient history, diagnosis, and treatment only when needed, without disrupting workflows or harming patients. Independent physicians evaluated dx quality using a 5-point scale, considering visits with low ratings as having high error risk.

The AI group had 16% fewer risk of dx errors and 13% fewer risk of treatment errors. 

🌟 The Bottom Line: When seamlessly integrated as a background safety net, an LLM co-pilot reduced diagnostic and treatment errors without workflow disruption, demonstrating AI's potential to meaningfully improve clinical quality.

Why does this matter? 

AI tools often look promising in theory but rarely get tested in real-world settings. This study, conducted in a high-volume, urban clinic network in Kenya, stands out for its rigorous, real-world deployment. It's the first of its kind to show how an LLM-based clinical decision support tool can be successfully integrated into live clinical workflows and reduce risk of clinical errors without disrupting care or clinician autonomy.

While it offers a powerful model for user-centered AI integration, the reported error reductions may be inflated due to unusually high baseline error rates and its non-standard method of measuring dx errors. The study's real value lies in demonstrating how to thoughtfully implement AI in clinical practice, providing a blueprint that other health systems can adopt and scale.

Who does this impact?    

Patients: This study shows AI may improve safety without changing how you interact with your doctor. The AI acts as an invisible safety net, catching errors while keeping your physician at the center of care. No patients were harmed during the study. 

Clinicians: This research shows how well-designed AI tools can strengthen your practice without disrupting workflow or undermining autonomy. The AI functions like an extra set of eyes running quietly in the background. Over time, clinicians began making fewer errors before being alerted, suggesting these tools may reinforce better clinical habits. 

Healthcare leaders: This study bridges the gap between promising AI models and real-world implementation by demonstrating what works in practice. The researchers show how they encouraged staff to follow AI recommendations through quality improvement cycles, providing a blueprint for scaling AI tools and new approaches for quality monitoring.

___

This Editor's Pick was curated by Blen Gebremeskel, MD, CODEX Project Intern  

Share your thoughts and join the conversation on LinkedIn.

About Editor's Picks

Curated by the UCSF CODEX team, each Editor’s Pick features a standout study or article that moves the conversation on diagnostic excellence forward. These pieces offer meaningful, patient-centered insights, use innovative approaches, and speak to the needs of patients, clinicians, researchers, and decision-makers alike. All are selected from respected journals or outlets for their rigor and real-world relevance.  

View more Editor's Picks here.