CODEX Digest - 9.18.25
Want this delivered straight to your inbox every Thursday? Subscribe now.
This week's digest features a new framework for teaching critical thinking with AI, a physician's personal experience of a life-threatening diagnostic delay, and a study using LLMs to diagnose rare diseases. Also highlighted this week are a systematic review showing diagnostic prediction models aren't ready for primary care use and evidence from 13 million clinical notes finding racial bias in how doctors assess patient credibility.
Here are this week's must-reads:
Titles link to the PubMed record or free-to-access sites with full text availability.
Educational strategies for clinical supervision of artificial intelligence use.
Abdulnour R-EE, Gin B, Boscardin CK. N Engl J Med. 2025;393(8):786-797.
This review looks at ways to teach critical thinking when using AI to make decisions. It introduces the DEFT-AI framework—diagnosis, evidence, feedback, teaching, and recommendations for AI use—which helps educators assess how learners interact with AI and guide them to think critically during care. It also describes different patterns of AI use, highlighting the types of tasks, potential risks, and concerns such as losing skills when relying too much on AI.
Racial bias in clinician assessment of patient credibility: evidence from electronic health records.
Beach MC, Harrigian K, Chee B, et al. PLoS ONE. 2025;20(8):e0328134.
Identifying and addressing racial bias in clinical assessment is essential for achieving diagnostic excellence. This study analyzed over 13 million clinical notes in the EHR using natural language processing (NLP) and found notes about Black patients were significantly more likely to include language undermining credibility, while those about Asian patients were more likely to support credibility. Language undermining credibility included forms of “claims,” “insists,” and “poor historian.” Testimonial injustice perpetuated in documentation can harm Black individuals by delaying or misattributing diagnoses and represents how healthcare institutions do not validate Black patients, contributing to healthcare untrustworthiness.
Derksen C, Walter FM, Akbar AB, et al. Implement Sci. 2025;20(1):33.
Clinical decision support systems (CDSS) have been shown to improve diagnostic excellence, but their uptake and implementation is low. This systematic review spotlights CDSS implementation and usability barriers in primary care. Based on the review, development teams should use diagnostic examples and engage with primary care and frontline systems to improve feasibility of implementation.
Egerton-Warburton D, Lim A, Tan YH, et al. Emerg Med Australas. 2025;37(4):e70086.
Clinician impressions of patient encounters serve as a beginning point for diagnostic and treatment processes. This study reviewed 432 patient records from four Australian emergency departments and found clinical impressions to be missing in 23.4% of cases. A clinically significant impression included at least one diagnosis or acknowledgement of diagnostic uncertainty. This information gap included lack of clinician recording of differential diagnoses, patient vulnerabilities, and other special factors. These omissions have the potential to affect diagnostic accuracy and result in patient harm.
Implementation of a curriculum on communicating diagnostic uncertainty for clerkship-level medical students: a pseudorandomized and controlled study. (subscription required)
Etherington NB, McQuade CN, Kohli A, et al. Diagnosis (Berl). 2025;12(3):341-348.
Explicit acknowledgement of diagnostic uncertainty to patients and families is a communication challenge that can erode trust. This study involved 54 internal medicine clerkship students in a curriculum to develop communication skills specific to revealing diagnostic ambiguities to patients. In a clinical skills test environment after the program, curriculum participants demonstrated superior skill in communicating uncertainty than students who did not participate.
Prospective evaluation of missed musculoskeletal injuries in trauma prevent study.
Hong SS, Tcharkhedian E, O’Regan W, et al. J Eval Clin Pract. 2025;31(4):e70182.
Diagnostic excellence can be provided by a range of providers as appropriate to improve results. This Australian study used a convenience sample of adult patients presenting to a trauma unit to compare whether physiotherapists or trauma teams were more likely to find injuries missed by the other when each did their post-hospital admission patient assessment first. The identification of missed injuries between the two groups was not significant, indicating the importance of the physiotherapy team in trauma assessments to ensure diagnostic accuracy of musculoskeletal injuries.
Hunik L, Chaabouni A, van Laarhoven T, et al. JMIR Med Inform. 2025;13:e62862.
A diagnostic prediction model uses patient data to estimate how likely a patient is to have a certain disease or condition. This systematic review evaluated the potential for AI-generated prediction models from the EHR in primary care. The analysis found most prediction models target one conditions. Only two studies tested the model in a real-world primary care setting, and there was high risk of bias in the evidence base. The authors conclude it is premature to implement diagnostic prediction models in primary care.
Community-based screening events to increase lung cancer screening in an urban AI/AN clinic. (subscription required)
Johnson E, Garagiola A, Stately A, et al. J Health Care Poor Underserved. 2025;36(3):771-781.
Community screening initiatives can help with early cancer diagnosis. This commentary highlights a successful partnership between a federally qualified health center clinic and a large hospital system. This local initiative held health fairs and tabling with direct invitations to eligible Minnesota American Indians/Alaska Native (AI/AN) patients. These two events supported shared decision making and transportation for lung cancer screening. Each one-day event resulted in six and seven patients completing screening, close to the amount screened by the clinic in the past nine months. The work shares insights into improving screening for indigenous populations.
Large language models for rare disease diagnosis at the undiagnosed diseases network.
Shyr C, Cassini TA, Tinker RJ, et al. JAMA Netw Open. 2025;8(8):e2528538.
The Undiagnosed Diseases Network (UDN) employs team-based collaboration to determine a diagnosis for patients struggling with rare, undiagnosed conditions. This research letter reports on a cohort study using LLMs to diagnose cases from the UDN. The LLM identified the final diagnosis in 13.3% of cases. The results offer insights into the utility of LLMs as a clinician diagnostic tool specifically for rare diseases.
Bridging diagnostic safety and mental health: a systematic review highlighting inequities in autism spectrum disorder diagnosis. (subscription required)
Srivarathan A, Bradford A, Shearkhani S, et al. BMJ Qual Saf. Epub 2025 Aug 25.
Mental disorders can be difficult to diagnose, and systemic biases can contribute to errors and delays undermining effective care for disadvantaged and culturally maligned patient groups. This systematic review found evidence indicating that mental health condition diagnostic inequality is present across the United States. The results highlight concerns specific to autism spectrum disorder but generally cover a range of contributors to diagnostic equality including clinician bias, language preference, health literacy, and insurance coverage issues.
AI should read mammograms only when confident: a hybrid breast cancer screening reading strategy.
Verboom SD, Kroes J, Pires S, et al. Radiology. 2025;316(2):e242594.
Heavy workloads can lead to mistakes in diagnosis. The integration of AI into screening processes has the potential to improve radiologist workloads. An AI model was used to examine malignancy probability and whether the system was uncertain of the assessment. Images with high uncertainty metrics from AI received a second review by a radiologist. This hybrid approach had similar cancer detection rate as traditional double radiologist reading. These results suggest that AI review of screens with uncertainty metrics applied could lower radiologist workload without affecting cancer identification.
The lost humanity of listening deeply.
Yaeger JP, Baker KR. J Hosp Med. Epub 2025 Aug 3
In this first-person account, a physician describes his frustration as a patient when his correct and life-threatening diagnosis was delayed despite efforts to speak up and participate in the process. He reflects on how much harder the experience might be for patients from marginalized groups without his knowledge or professional background. The piece explores why doctors sometimes don’t listen to patients, pointing to factors like cognitive biases, time pressures, administrative demands and misaligned incentives.
**The first author will be featured in CODEX's upcoming webinar.
About the CODEX Digest
Stay current with the CODEX Digest, which cuts through the noise to bring you a list of recent must-read publications handpicked by the Learning Hub team. Each edition features timely, relevant, and impactful journal articles, books, reports, studies, reviews, and more selected from the broader CODEX Collection—so you can spend less time searching and more time learning.
Get the latest in diagnostic excellence, curated and delivered straight to your inbox every week:
See past digests here.
