Result: Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.

Title:

Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.

Authors:

Dutta N; Department of Radiodiagnosis, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India., Bose K; Department of Radiodiagnosis, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India., Syailendra E; The Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States., Chu L; The Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States., Gupta P; Department of Radiodiagnosis, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India. Electronic address: pankajgupta959@gmail.com.

Source:

International journal of medical informatics [Int J Med Inform] 2026 Mar 15; Vol. 208, pp. 106227. Date of Electronic Publication: 2025 Dec 28.

Publication Type:

Journal Article; Review

Language:

English

Journal Info:

Publisher: Elsevier Science Ireland Ltd Country of Publication: Ireland NLM ID: 9711057 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1872-8243 (Electronic) Linking ISSN: 13865056 NLM ISO Abbreviation: Int J Med Inform Subsets: MEDLINE

Imprint Name(s):

Original Publication: Shannon, Co. Clare, Ireland : Elsevier Science Ireland Ltd., c1997-

MeSH Terms:

Diagnostic Imaging*/methods , Image Interpretation, Computer-Assisted*/methods , Artificial Intelligence*, Humans ; Workflow

Contributed Indexing:

Keywords: Artificial intelligence; Clinical validation; Foundation model; Large language models; Medical imaging; Radiology; Report generation; Vision-language models

Entry Date(s):

Date Created: 20260103 Date Completed: 20260116 Latest Revision: 20260116

Update Code:

20260130

DOI:

10.1016/j.ijmedinf.2025.106227

PMID:

41483727

Database:

MEDLINE

Further Information

*Background: Radiology faces an unprecedented workload crisis, creating demand for AI solutions to enhance efficiency and quality. Vision-language models (VLMs) represent a paradigm shift from narrow AI tools to integrated systems for image interpretation and report generation. However, their rapid technical progress has outpaced rigorous clinical validation, creating a critical gap between their theoretical potential and safe, practical deployment.
Objective: To critically review the state of VLMs in diagnostic imaging by evaluating their clinical validation, identifying deployment challenges, and assessing their impact on the radiological workflow. This review provides a roadmap for responsible clinical integration by analyzing the gap between model performance and real-world utility.
Method: A narrative review of literature was conducted from January 2017 to May 2025. The search focused on VLM applications in radiology, including automated report generation and visual question answering. We synthesized findings from technical and clinical validation studies, thematically organized around architectural evolution, applications, validation, and implementation barriers.
Results: A clear progression from encoder-decoder models to sophisticated LLM-integrated foundation models was identified. While these models achieve high performance on NLP metrics, their clinical utility is limited. Key findings include: (1) Pervasive model hallucination, with factual errors in ∼ 22 % of AI-generated reports; (2) A lack of external validation on diverse, multi-institutional datasets; (3) Significant implementation barriers, including high computational costs, poor workflow integration, and unresolved liability. Human expert evaluations show that while AI-generated reports for routine cases are often acceptable (77.7 % in one study), accuracy declines significantly in complex cases.
Conclusion: VLMs hold transformative potential but are not ready for autonomous clinical use. Their primary value lies in augmenting radiologists' workflow. For successful adoption, the field must shift focus from algorithmic metrics to proving clinical safety and efficacy through rigorous validation, developing robust hallucination mitigation strategies, and designing seamless workflow integrations.
(Copyright © 2025 Elsevier B.V. All rights reserved.)*

*Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.*

*Result*: Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.

*Further Information*

*Links*

*Additional functions*

Result: Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.

Further Information

Links

Additional functions