*Result*: Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.
*Further Information*
*Background: Radiology faces an unprecedented workload crisis, creating demand for AI solutions to enhance efficiency and quality. Vision-language models (VLMs) represent a paradigm shift from narrow AI tools to integrated systems for image interpretation and report generation. However, their rapid technical progress has outpaced rigorous clinical validation, creating a critical gap between their theoretical potential and safe, practical deployment.
Objective: To critically review the state of VLMs in diagnostic imaging by evaluating their clinical validation, identifying deployment challenges, and assessing their impact on the radiological workflow. This review provides a roadmap for responsible clinical integration by analyzing the gap between model performance and real-world utility.
Method: A narrative review of literature was conducted from January 2017 to May 2025. The search focused on VLM applications in radiology, including automated report generation and visual question answering. We synthesized findings from technical and clinical validation studies, thematically organized around architectural evolution, applications, validation, and implementation barriers.
Results: A clear progression from encoder-decoder models to sophisticated LLM-integrated foundation models was identified. While these models achieve high performance on NLP metrics, their clinical utility is limited. Key findings include: (1) Pervasive model hallucination, with factual errors in ∼ 22 % of AI-generated reports; (2) A lack of external validation on diverse, multi-institutional datasets; (3) Significant implementation barriers, including high computational costs, poor workflow integration, and unresolved liability. Human expert evaluations show that while AI-generated reports for routine cases are often acceptable (77.7 % in one study), accuracy declines significantly in complex cases.
Conclusion: VLMs hold transformative potential but are not ready for autonomous clinical use. Their primary value lies in augmenting radiologists' workflow. For successful adoption, the field must shift focus from algorithmic metrics to proving clinical safety and efficacy through rigorous validation, developing robust hallucination mitigation strategies, and designing seamless workflow integrations.
(Copyright © 2025 Elsevier B.V. All rights reserved.)*
*Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.*