CODEX Digest - 6.12.25
This week, we highlight studies that examine how clinical reasoning is shaped—from delayed referrals and emergency triage to AI system pitfalls and medical education influences—and what it takes to build more reliable, equitable, and informed diagnostic decision-making.
Want this delivered straight to your inbox every Thursday? Subscribe now.
Here are this week's must-reads:
Titles link to the PubMed record or free-to-access sites with full text availability.
Red teaming ChatGPT in medicine to yield real-world insights on model behavior.
Chang CT, Farah H, Gui H, et al. NPJ Digital Med. 2025;8(1):149.
Recognizing information system weaknesses informs design to improve the system’s reliability as a decision-making tool. This proof-of-concept study applies the process of red teaming—an established method of exposing information system weaknesses and their potential for unintended negative consequences—to identify critical shortcomings that negatively affect clinician decision making. The red teaming tactic applied here identified incorrect diagnostic approaches, treatment inaccuracy, and biased determinations resulting from developed models. The work highlights how transdisciplinary teams including med students and clinicians help learners recognize shortcomings of AI for clinical care.
Patient perspectives on delayed specialty follow-up after a primary care visit.
Fernández L, Ricci D, Pollack A, et al. J Am Board Fam Med. Epub 2025 Apr 23.
Referrals that are delayed or not completed are known detractors from diagnostic excellence. This qualitative study interviewed primary care patients without timely or completed referral actions and found patients didn’t follow through because they didn’t understand the urgency or the rationale behind the referral. Clinicians can better communicate clear and meaningful referral information to their patients and advocate for consistent appointment booking processes.
**This was a recent CODEX Editor's Pick.
Clerkship students' use of clinical reasoning concepts after a pre-clinical reasoning course.
Kulkarni SA, Dhaliwal G, Teherani A, et al. J Gen Intern Med. 2025;40(6):1359-1366.
Clinical reasoning skills, such as illness scripts and prioritized differential diagnosis, are core to effective diagnostic and treatment decisions. This qualitative analysis of participant insights on a clinical reasoning class and its impact on their clerkship found students used reasoning concepts, but contextual and supervisor factors facilitated or diminished use of the CR framework. The authors propose a model that educators can apply to help students translate clinician reasoning concepts to front line care.
From System 1 to System 2: a survey of reasoning large language models. (preprint)
Li ZZ, Zhang D, Zhang ML, et al. arXiv. Epub 2025 Apr 25.
Heuristic decision making can demonstrate expertise but also harbor bias and overconfidence that can negatively affect diagnostic accuracy. This preprint examines LLM reasoning modalities to inform the evolution of models away from quick decision-making actions toward more deliberate human-like reasoning. The authors’ comparative discussion aims to inform the development and advancement of LLMs’ reasoning capacity. A range of existing models are reviewed, including those designed to inform medical care.
Diagnostic expertise in the emergency department. (subscription required)
Newsome E, Klein G, Hoy K, et al. J Cogn Eng Decis Mak. Epub 2025 Apr 22.
Diagnostic expertise in active clinicians is routinely tested in the challenging emergency care environment. This interview study explored critical decision-making experiences with emergency department (ED) personnel to identify six core elements of expertise as evidenced in the emED, including sensemaking and teamwork capacity. The results noted the importance of follow-up and feedback in developing expertise and the need for robust system-level changes to enable those opportunities.
The urgency of centering safety-net organizations in AI governance.
Nong P, Maurer E, Dwivedi R. NPJ Digital Med 2025;8:117.
The digital divide has the potential to detract from socioeconomic and healthcare equity to affect diagnosis, treatment, and innovation opportunities. This perspective recommends AI governance and development leaders to engage safety net organizations in design, policy, and implementation strategies to reduce AI system risks on marginalized patients and communities.
**The lead author participated in a recent CODEX webinar on AI bias.
The Age of Diagnosis: How Our Obsession with Medical Labels Is Making Us Sicker.
O'Sullivan S. Thesis; 2025. ISBN: 978-0593852910.
Large language models have the potential to address gaps in clinical reasoning that can lead to poor diagnostic outcomes and patient harm. Autonomous AI systems present opportunities, but the authors highlight the need for effective evaluation measures that incorporate human knowledge in the evaluation process. They propose using existing frameworks, developing new assessments for key cognitive actions, focusing on human-computer collaboration, and transitioning from models to clinical trials to assess system safety. Check out some news coverage on this publication.
When it comes to benchmarks, humans are the only way. (subscription required)
Rodman A, Zwaan L, Olson A, et al. NEJM AI 2025;2.
Large language models have the potential to address gaps in clinical reasoning that can lead to poor diagnostic outcomes and patient harm. Autonomous AI systems present opportunities, but the authors highlight the need for effective evaluation measures that incorporate human knowledge in the evaluation process. They propose using existing frameworks, developing new assessments for key cognitive actions, focusing on human-computer collaboration, and transitioning from models to clinical trials to assess system safety.
Emergency department triage accuracy and delays in care for high-risk conditions.
Sax DR, Warton EM, Mark DG, et al. JAMA Netw Open. 2025;8:e258498.
Appropriate triage of patients on presentation initiates timely diagnostic processes. This retrospective cohort study examined 5,929 patients diagnosed with subarachnoid hemorrhage (SAH), aortic dissection (AD), and ST-elevation myocardial infarction (STEMI). The undertriage experiences resulted in delays in diagnostic and care orders for SAH and AD but not for those with STEMI. The results underscore the importance of effective triage as a component of diagnostic and care excellence.
van Sassen C, van den Broek W, Bindels P, et al. Perspect Med Educ. 2025;14(1):194-207.
Learning from mistakes is a highlighted modality for improvement that educators have challenged as an effective strategy. This study examined the language and context of three diagnostic risk discussions: legal, error, and neutral. The authors found diminished resident recall of clinical details presented in malpractice contexts with more claim-specific details than the other scenarios, while emotional reactions to the cases remained constant across the spectrum. This shows that the framing and presentation of diagnostic error feedback impacts clinical recall among learners.
About the CODEX Digest
Stay current with the CODEX Digest, which cuts through the noise to bring you a list of recent must-read publications handpicked by the Learning Hub team. Each edition features timely, relevant, and impactful journal articles, books, reports, studies, reviews, and more selected from the broader CODEX Collection—so you can spend less time searching and more time learning.
Get the latest in diagnostic excellence, curated and delivered straight to your inbox every week:
See past digests here.