CODEX Digest - 7.10.25

Banner

Want this delivered straight to your inbox every Thursday? Subscribe now.

This week explores how artificial intelligence (AI) can best support diagnostic excellence, highlighting the importance of human-AI collaboration, system-level design, and patient trust. Also featured are new findings on diagnostic errors in hospitals and emergency settings, insights into how cognitive bias and communication affect clinical reasoning, and international perspectives on the use of AI.

Here are this week's must-reads: 

Titles link to the PubMed record or free-to-access sites with full text availability.

The optimization paradox in clinical AI multi-agent systems.

Bedi S, Mlauzi I, Shin D et al. arXiv. Epub 2025 Jun 12.  

Safe systems are developed with an eye toward understanding both overarching and distinct unit capabilities. After evaluating single agent systems against multi-agent systems, this pre-peer reviewed article concludes that for healthcare artificial intelligence (AI) systems to be reliable, it is not enough to enhance parts individually, but to focus on how information moves between different systems to align the functions and ensure they are effective and safe.  

Multinational attitudes toward AI in health care and diagnostics among hospital patients. 

Busch F, Hoffmann L, Xu L et al. JAMA Netw Open. 2025;8(6):e2514452.  

Patient acceptance of AI is important to support successful AI implementation across healthcare. This study examines how patients across six continents and 43 countries feel about using AI in areas such as cancer imaging, second opinions, and accuracy. The results show that patients generally trust the use of AI in diagnosis particularly when combined with physician-led decision-making. Variances were, however, seen related to patient demographics, health status, and comfort with technology. 

The effect of a provisional diagnosis on intern diagnostic reasoning: a mixed methods study(subscription required)  

Clary C, Cohen A, Kumar S et al. Diagnosis (Berl). 2025; 12(2):208–216. 

Confirmation biases can affect diagnosis when clinicians are told a diagnosis is correct and do not explore other options.  This study found that making an early, accurate diagnostic decision can both help and hurt how interns think through clinical situations. While a correct, communicated provisional diagnosis might lead to better decisions, it can also create blind spots in the reasoning process. Teaching learners to reflect on their thinking holds promise toward improving diagnostic reasoning

Human factors in diagnostic radiology: practical challenges and cognitive biases. (subscription required) 

Cowen JE, Vigneswaran G, Bekker J et al. Eur J Radiol. 2025;190:112248. 

The interface between clinicians and diagnostic technologies should reflect human factors concerns to limit potential environmental and unintended cognitive consequences that reduce safety. This review discusses mechanisms to reduce distractions and apply cognitive debiasing methods in daily work to support diagnostic decision making

Adverse diagnostic events in hospitalised patients: a single-centre, retrospective cohort study(subscription required) 

Dalal AK, Plombon S, Konieczny K et al. BMJ Qual Saf. 2025;34(6):377–388.    

Surveillance strategies are a valuable mechanism for tracking diagnostic errors. This study uses the Safer Dx instrument and the Diagnostic Error Evaluation and Research (DEER) Taxonomy to define and categorize diagnostic errors at one hospital. A calculation based on weighted samples shows 1 in 14 hospitalized patients may be harmed by diagnostic failures, the majority of which were preventable. Delays were commonly found to result in harmful diagnostic errors. Process evaluation methods to sample higher risk cases and perform retrospective review may reveal new insights for safer hospital systems.   

Comparative analysis of large language models in clinical diagnosis: performance evaluation across common and complex medical cases.

Dinc MT, Bardak AE, Bahar F et al. JAMIA Open. 2025;8(3):ooaf055.

Real-world case data is valuable for examining the application of AI to active care diagnoses. This study evaluates the diagnostic capabilities of advanced language models by inputting patient information into the system in stages to mirror active clinical information gathering workflows. While the AI tools generated accurate results across a wide spectrum of cases, diagnostic accuracy was higher for common cases and lower for more complex cases. More work is needed to determine how to implement AI safely and effectively in real-world care and medical training. 

Sepsis: Investigating Under the Patient Safety Incident Response Framework (PSIRF).

Heath Services Safety Investigation Body; 2025.  

Missed or delayed diagnosis of sepsis is a global persistent and life-threatening deterrent to safe care. This series of reports from the United Kingdom analyzes sepsis occurrence in three situations with patients experiencing urine infection, abdominal pain, and diabetes and a foot infection. Each report details organizational contributors to sepsis diagnostic failures and how incident data could inform improvement. 

Potential diagnostic error for emergency conditions, mortality, and healthy days at home.

Lin MP, Burke RC, Sabbatini AK et al. JAMA Netw Open. 2025;8(6):e2516400.  

Emergency departments (ED) harbor conditions contributing to diagnostic error. This national study examines Medicare beneficiaries who were hospitalized within nine days of being seen for 10 high-risk conditions. The results document the rates of potential diagnostic error and show they are associated with higher mortality rates and fewer healthy days at home. While the number of potential errors was only 3.2%, it was found to be mixed by condition, with meningitis and spinal abscess more frequently associated with diagnostic errors.  

Evaluating large language models for drafting emergency department encounter summaries. 

Williams CYK, Bains J, Tang T et al. PLOS Digit Health. 2025;4: e0000899.  

AI has the potential to assist with administrative tasks but its effectiveness in healthcare is developing. This study examines the accuracy of GPT tools to document patient-clinician conversations upon presentation at the emergency department. The authors found inaccuracies, hallucinations, and omissions in the summaries reviewed. While the potential for harm was low, the presence of errors in summaries may contribute to misinformation that harms patients. Clinicians must review clinical conversation documentation to ensure correct information is captured to support care excellence.    

Human-AI collectives most accurately diagnose clinical vignettes.

Zöller N, Berger J, Lin I et al. Proc Natl Acad Sci USA. 2025;122:e2426153122

AI is a promising contributor to diagnostic excellence. This commentary discusses the application of a process to combine large language model analysis with human decision makers to assess the accuracy of various combinations of system and human intelligence. The analysis found the pairing of technical and human outputs to be most effective for complex clinical decision-making. 

**This was a recent Editor's Pick

About the CODEX Digest

Stay current with the CODEX Digest, which cuts through the noise to bring you a list of recent must-read publications handpicked by the Learning Hub team. Each edition features timely, relevant, and impactful journal articles, books, reports, studies, reviews, and more selected from the broader CODEX Collection—so you can spend less time searching and more time learning.

Get the latest in diagnostic excellence, curated and delivered straight to your inbox every week:

Subscribe Now

See past digests here