CODEX Digest - 5.14.26 | Coordinating Center for Diagnostic Excellence

Want this delivered straight to your inbox every Thursday? Subscribe now.

This week's digest features a study addressing systems-level diagnostic safety issues surrounding radiology-primary care communication failures (#1), a national study identifying unequitable access to mammography screening among transgender and gender diverse individuals (#2), and a community-focused study that asked participants about their hopes and concerns surrounding the use of AI in healthcare (#8).

Titles link to the PubMed record or free-to-access sites with full text availability.

1) Electronic health record usability for management of actionable incidental imaging findings: A survey of primary care providers.

Bechel MA, Kelly A, Krupinski E, et al. Curr Probl Diagn Radiol. Epub 2026 Mar 24.

Effective communication of abnormal test findings amongst the care team supports diagnostic timeliness. Actionable incidental imaging findings (AIF) appear in up to one-third of radiology reports and can be challenging for primary care physicians (PCPs) to manage. This survey of PCPs at a multi-site academic institution assessed three EHR-based AIF follow-up systems for incidental pulmonary nodules, finding a preference for knowledge push over knowledge pull systems.

2) Nationwide mammographic screening among a large population of underserved subgroups.

Cathcart-Rake EJ, Thao V, Le-Rademacher J, et al. J Clin Oncol. 2026;44(11):981-991.

Equitable access to screening is a valuable strategy for early cancer detection. This national study examines screening mammography rates among transgender and gender diverse individuals. Results show cisgender women and transgender men have higher screening rates, indicating clinicians offer mammography primarily to those at average or high breast cancer risk. Recommendations should be tailored to a person's anatomy, hormones, family history, and genetics rather than sex assigned at birth, improving identification of those needing screening.

3) Changing clinician behavior in geriatrics: point-of-care alerts for prostate-specific antigen screening.

Duong N, Lee JY, Peprah Y, et al. Am J Prev Med. Epub 2026 Feb 9.

PSA screening in older men can result in overdiagnosis and overtreatment. This study evaluates how a behavioral science-based clinical decision support (CDS) system reduced unnecessary PSA testing by clinicians over 18 months. Lower alert rates indicated clinician learning, but persistent alerts with increased orders highlighted the need for additional strategies to engage less responsive clinicians.

4) Major Trauma Triage Study (MATTS): diagnostic accuracy of major trauma triage tools in English regional trauma networks - A case-cohort study.

Fuller GW, Baird J, Keating S, et al. PLoS ONE. 2026;21(3):e0344996.

Under- and over-triage in emergency cases detract from effective diagnostic and treatment processes. This case-cohort study externally validated trauma triage tools in England, identifying four with varying sensitivity and specificity. The optimal tool depends on major trauma prevalence and the importance placed on false positives versus false negatives. Further real-world evaluation, considering compliance and clinical judgment, is needed.

5) Managing and communicating diagnostic uncertainty in pediatric emergency care: national insights and opportunities for intervention.

Geanacopoulos AT, Totman MS, Miller KA, et al. Pediatr Emerg Care. Epub 2026 Apr 23.

Diagnostic uncertainty communication is essential to diagnostic excellence, influencing safety, trust, escalation, and follow-up, but remains understudied in pediatric emergency care. This interview study explores how emergency physicians manage and communicate uncertainty, revealing it as a multidimensional, contextual experience involving clinicians and caregivers. Communication tools, decision support, and system-level interventions can enhance diagnostic safety.

6) Enhanced medical diagnostic reasoning in small language models using reinforcement learning.

Gebreab S, Musamih A, Salah K, et al. Int J Cogn Comput Eng. 2026;7:400-411.

LLMs may help physicians during diagnostic decision making but remain susceptible to cognitive biases or missed diagnostic possibilities. This study examines whether extra training could improve the reasoning of small language models – reinforcement learning with fewer parameters that have been calibrated on a subset of data for a specific use. Using patient records, the researchers improved their models’ diagnostic accuracy—with only a small amount of training data at minimal costs. The results suggest smaller AI models could become affordable tools to support better medical diagnosis.

7) Qualitative analysis of mental health clinicians' perspectives on external barriers to diagnosing anxiety disorders in the Veterans Health Administration.

Gentz A, Stewart RA, Chen PV, et al. Psychol Serv. Epub 2026 Mar 12.

Anxiety disorders are at least twice as common among veterans as in the general population, yet many patients receive a nonspecific “unspecified anxiety disorder” diagnosis that may limit targeted treatment. This qualitative study found that limited diagnostic tools, time constraints, incomplete histories, poor referral information, patient difficulty describing symptoms, and mental health stigma all hinder diagnostic specificity. Clinicians suggested better screening tools, more training, improved communication, and greater use of collateral information to improve diagnosis.

8) Community perspectives on health AI: hopes, concerns and implications for health systems and trustworthy AI.

Ryan KA, Sielaff ML, Saleem D, et al. AI Ethics. 2026;6(2):176.

Growing use of AI in healthcare has raised issues around trust, transparency, and governance. In five virtual sessions with 159 racially and socioeconomically diverse Michigan community members, participants saw benefits like improved diagnostics and efficiency, but worried about privacy, lack of clarity, reduced human contact, and weak oversight. The results support the need for clear communication about AI use, inclusive regulation, and maintaining human judgment alongside AI tools.

9) AI at the bedside: randomised controlled trial of ChatGPT's impact on student performance in real-patient clinical exams.

Saloojee H, Gramanie MC, Mwali R, et al. Med Teach. Epub 2026 Apr 20.

Generative AI tools are entering clinical training faster than curricula and assessments can adapt, and their real-time impact on bedside performance remains unclear. This South African randomized study of final year medical students assessed whether allowing ChatGPT use during ward-based observed clinical exams with real patients improved student performance. Results showed no difference between GPT- versus non-GPT assisted clinical performance.

10) The effect of medical explanations from large language models on diagnostic accuracy in radiology.

Spitzer P, Hendriks D, Rudolph J, et al. NPJ Digital Med. 2026;9(1).

LLMs are increasingly used for diagnostic support, yet the ideal format for their explanations remains unclear. In this large-scale vignette German study, explanation format significantly influenced diagnostic accuracy. Requesting "chain of thought" output for a diagnosis helped physicians identify and correct potential errors in LLM predictions, improving decision-making. These findings highlight the importance of explanation design in clinical AI.

11) Defining delayed and missed diagnosis of acute and chronic coronary syndromes: a modified Delphi consensus method.

Van Schalkwijk DL, Dobbe ASM, Mommersteeg PMC, et al. Open Heart. 2026;13(1):e003998.

Lack of standardized definitions for delayed and missed coronary artery disease (CAD) diagnosis has limited research on its causes and frequency. This multi-stakeholder consensus project developed a framework and recommended definitions for delayed and missed diagnosis in acute and chronic coronary syndromes, as no single definition was realized. Although not intended as universal definitions, the framework provides a foundation for more consistent research and comparison across diagnostic settings.

12) Trust, scrutiny, or collaboration? A performance-based framework for human–AI interaction in medicine.

Zwaan L, Rodman A, Shimizu T. NEJM AI. 2026;3(5).

As AI integrates into clinical decisions, diagnostic errors may stem less from “automation bias” than from flawed information shaping reasoning. This commentary proposes a dynamic framework based on relative accuracy and complementary strengths, outlining four interaction zones—human-dominant, AI-dominant, hybrid review, and disagreement resolution—each demanding distinct workflows and continuous trust calibration.

About the CODEX Digest

Stay current with the CODEX Digest, which cuts through the noise to bring you a list of recent must-read publications handpicked by the Learning Hub team. Each edition features timely, relevant, and impactful journal articles, books, reports, studies, reviews, and more selected from the broader CODEX Collection—so you can spend less time searching and more time learning.

Get the latest in diagnostic excellence, curated and delivered straight to your inbox every week:

Subscribe Now

See past digests here.