Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechExclusive | OpenAI Buys Tech-Industry Talk Show TBPN - WSJGoogle News: OpenAIPrediction: The $700 Billion Artificial Intelligence (AI) Capex Boom Will Create the Best Buying Opportunity of 2026 for These 3 Stocks - The Motley FoolGoogle News: AIp-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official releaseReddit r/LocalLLaMAFrom Assistant to Actor: What the Rise of Agentic AI Means for Your Business - Morgan LewisGoogle News: Generative AIIndia AI Startup Sarvam Raises Funds at $1.5 Billion ValuationBloomberg TechnologyApple's AI Strategy Is Pivoting. Here's Why That Could Be Great News for the Stock. - The Motley FoolGNews AI AppleThere’s a Blinking Warning Sign for the Data Centers in Space IndustryFuturism AIThe Practical Guide to Superbabieslesswrong.comWill Gemma 4 124B MoE open as well?Reddit r/LocalLLaMA🔮 Autoresearch and the experimental societyExponential ViewBlack Hat USADark ReadingBlack Hat AsiaAI BusinessThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechExclusive | OpenAI Buys Tech-Industry Talk Show TBPN - WSJGoogle News: OpenAIPrediction: The $700 Billion Artificial Intelligence (AI) Capex Boom Will Create the Best Buying Opportunity of 2026 for These 3 Stocks - The Motley FoolGoogle News: AIp-e-w/gemma-4-E2B-it-heretic-ara: Gemma 4's defenses shredded by Heretic's new ARA method 90 minutes after the official releaseReddit r/LocalLLaMAFrom Assistant to Actor: What the Rise of Agentic AI Means for Your Business - Morgan LewisGoogle News: Generative AIIndia AI Startup Sarvam Raises Funds at $1.5 Billion ValuationBloomberg TechnologyApple's AI Strategy Is Pivoting. Here's Why That Could Be Great News for the Stock. - The Motley FoolGNews AI AppleThere’s a Blinking Warning Sign for the Data Centers in Space IndustryFuturism AIThe Practical Guide to Superbabieslesswrong.comWill Gemma 4 124B MoE open as well?Reddit r/LocalLLaMA🔮 Autoresearch and the experimental societyExponential View
AI NEWS HUBbyEIGENVECTOREigenvector

Multidimensional evaluation of large language models in radiology report readability

nature.comby Yunhai MaoApril 1, 20261 min read0 views
Source Quiz

npj Digital Medicine, Published online: 01 April 2026; doi:10.1038/s41746-026-02589-3 Multidimensional evaluation of large language models in radiology report readability

Abstract

This study systematically investigated the influence of demographic characteristics on the readability of patient-centric radiology reports and compared the performance of different large language models (LLMs) in generating patient-centered reports. Adopting a sequential two-stage design, the research first conducted a retrospective evaluation involving 320 radiology reports followed by a clinical setting validation with 800 patients. Results suggested that all three LLMs significantly improved the readability of radiology reports (P < 0.05), with DeepSeek-R1 showing potentially superior performance within this specific cohort. Demographic analysis revealed significant interactive effects: higher education and older age (within consistent educational levels) were associated with better comprehension. Clinical setting validation further indicated that reading simplified reports suggesting the potential to significantly improved patients’ subjective and objective comprehension while significantly alleviating medical anxiety (P < 0.05). However, limitations persist, including inconsistent model outputs, missing anatomical details, and comprehension variances driven by demographic factors. Consequently, LLMs should be integrated as auxiliary communication tools for radiologists rather than standalone solutions, necessitating personalized interventions tailored to specific demographic profiles.

Similar content being viewed by others

Data availability

The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.

References

  • Vijan, A., Bhagwanani, A., Calle, F. & Brun-Vergara, M. L. Optimizing patient communication in radiology. Radiographics 43, e230002 (2023).

Google Scholar

  • Rockall, A. G., Justich, C., Helbich, T. & Vilgrain, V. Patient communication in radiology: moving up the agenda. Eur. J. Radiol. 155, 110464 (2022).

Google Scholar

  • Cabarrus, M., Naeger, D. M., Rybkin, A. & Qayyum, A. Patients prefer results from the ordering provider and access to their radiology reports. J. Am. Coll. Radi. ol. 12, 556–562 (2015).

Google Scholar

  • Gunn, A. J. et al. JOURNAL CLUB: structured feedback from patients on actual radiology reports: a novel approach to improve reporting practices. AJR Am. J. Roentgenol. 208, 1262–1270 (2017).

Google Scholar

  • Martin-Carreras, T., Cook, T. S. & Kahn, C. E. Jr Readability of radiology reports: implications for patient-centered care. Clin. Imaging 54, 116–120 (2019).

Google Scholar

  • Burns, J., Agarwal, V., Catanzano, T. M., Schaefer, P. W. & Jordan, S. G. Talking points: enhancing communication between radiologists and patients. Acad. Radiol. 29, 888–896 (2022).

Google Scholar

  • Yin, S. et al. A survey on multimodal large language models. Natl. Sci. Rev. 11, nwae403 (2024).

Google Scholar

  • Gulati, V. et al. Transcending language barriers: can ChatGPT Be the key to enhancing multilingual accessibility in health care? J. Am. Coll. Radiol. 21, 1888–1895 (2024).

Google Scholar

  • Herwald, S. E. et al. RadGPT: a system based on a large language model that generates sets of patient-centered materials to explain radiology report information. J. Am. Coll. Radiol. 22, 1050–1059 (2025).

Google Scholar

  • Leutz-Schmidt, P. et al. Performance of large language models ChatGPT and Gemini on workplace management questions in radiology. Diagnostics 15, 497 (2025).

Google Scholar

  • Elhakim, T. et al. Enhanced PROcedural information READability for Patient-Centered Care in Interventional Radiology With Large Language Models (PRO-READ IR). J. Am. Coll. Radiol. 22, 84–97 (2025).

Google Scholar

  • Kim, H. et al. Conversion of mixed-language free-text CT reports of pancreatic cancer to national comprehensive cancer network structured reporting templates by using GPT-4. Korean J. Radiol. 26, 557–568 (2025).

Google Scholar

  • Çamur, E., Cesur, T. & Güneş, Y. C. A comparative study: performance of large language models in simplifying Turkish computed tomography reports. J. Infect. Public Health 87, 321–326 (2024).

Google Scholar

  • Berzolla, E. et al. Artificial intelligence large language models improve patient comprehension of radiologist magnetic resonance imaging reports. Arthroscopy 41, 4607–4614.e4604 (2025).

Google Scholar

  • Chen, A. H., Rudin, R. S., Levine, D. M. & Mehrotra, A. Improving patient understanding of radiology reports using generative artificial intelligence: a vignette study of 2000 US adults. J. Am. Med. Inform. Assoc. https://doi.org/10.1093/jamia/ocaf187 (2025).
  • Doyle, C., Lennox, L. & Bell, D. A systematic review of evidence on the links between patient experience and clinical safety and effectiveness. BMJ Open 3, e001570 (2013).

Google Scholar

  • Jeblick, K. et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur. Radiol. 34, 2817–2825 (2024).

Google Scholar

  • Doshi, R. et al. Quantitative evaluation of large language models to streamline radiology report impressions: a multimodal retrospective analysis. Radiology 310, e231593 (2024).

Google Scholar

  • Rahsepar, A. A. Large language models for enhancing radiology report impressions: improve readability while decreasing burnout. Radiology 310, e240498 (2024).

Google Scholar

  • Nakaura, T. et al. The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI. Jpn. J. Radiol. 42, 685–696 (2024).

Google Scholar

  • Prucker, P. et al. A prospective controlled trial of large language model-based simplification of oncologic CT reports for patients with cancer. Radiology 317, e251844 (2025).

Google Scholar

  • Jebb, A. T., Ng, V. & Tay, L. A review of key likert scale development advances: 1995-2019. Front. Psychol. 12, 637547 (2021).

Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the Radiology Department of the Third Hospital of Jilin University for their support of this research, and Professor Mengchao Zhang on the research team.

Author information

Author notes

  • These authors contributed equally: Yunhai Mao, Chunyan Wang.

Authors and Affiliations

  • Department of Radiology, the Third Hospital of Jilin University, Changchun, China

Yunhai Mao, Chunyan Wang, Yuxin Li, Wei Wang & Mengchao Zhang

Authors

  • Yunhai Mao
  • Chunyan Wang
  • Yuxin Li
  • Wei Wang
  • Mengchao Zhang

Contributions

M.Z. conceptualized the study, performed formal analysis and investigation, and was responsible for project administration and supervision. Y.M. and C.W. (equal contributors) contributed to data curation, formal analysis, methodology, validation, visualization, and wrote the original draft and revised the manuscript. Y.L. contributed to data curation, methodology, and visualization. W.W. contributed to methodology and writing the original draft. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Mengchao Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Mao, Y., Wang, C., Li, Y. et al. Multidimensional evaluation of large language models in radiology report readability. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02589-3

Download citation

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelvaluation

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Multidimens…modellanguage mo…valuationreportpublishednature.com

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 183 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models