Domain specific multimodal large language model for automated endoscopy reporting with multicenter prospective validation
npj Digital Medicine, Published online: 28 March 2026; doi:10.1038/s41746-026-02569-7 Domain specific multimodal large language model for automated endoscopy reporting with multicenter prospective validation
Data availability
Individual de-identified participant data that underlie the results reported in this article can be shared with investigators for research purposes. Access to the data can be requested from the first corresponding author, [email protected]. Data access will be granted after signing a data access agreement. The pretraining model, software, source code used in the paper, and associated test data and parameters have been provided in https://github.com/endo-angel/MLLM-for-Automatically-Reporting-Lesions-of-Upper-GI-Endoscopy.
References
- Kaminski, M. F. et al. Performance measures for lower gastrointestinal endoscopy: a European Society of Gastrointestinal Endoscopy (ESGE) Quality Improvement Initiative. Endoscopy 49, 378–397 (2017).
Google Scholar
- Rutter, M. D. & Rees, C. J. Quality in gastrointestinal endoscopy. Endoscopy 46, 526–528 (2014).
Google Scholar
- Barbetta, A. et al. Quality of endoscopy reports for esophageal cancer patients: where do we stand? J. Gastrointest. Surg. 22, 778–784 (2018).
Google Scholar
- Bazerbachi, F., Chahal, P. & Shaukat, A. Improving upper gastrointestinal endoscopy quality. Clin. Gastroenterol. Hepatol. 21, 2457–2461 (2023).
Google Scholar
- Yokota, Y. et al. Effects of a novel endoscopic reporting system with voice recognition on the endoscopic procedure time and report preparation time: propensity score matching analysis. J. Gastroenterol. 57, 1–9 (2022).
Google Scholar
- Cid, Y. D. et al. Development and validation of open-source deep neural networks for comprehensive chest X-ray reading: a retrospective, multicentre study. Lancet Digit. Health 6, e44–e57 (2024).
Google Scholar
- Kim, C. et al. Transparent medical image AI via an image-text foundation model grounded in medical literature. Nat. Med 30, 1154–1165 (2024).
Google Scholar
- Ji, J., Hou, Y., Chen, X., Pan, Y. & Xiang, Y. Vision-language model for generating textual descriptions from clinical images: model development and validation study. JMIR Form. Res. 8, e32690 (2024).
Google Scholar
- Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Google Scholar
- Kottlors, J. et al. Feasibility of differential diagnosis based on imaging patterns using a large language model. Radiology 308, e231167 (2023).
Google Scholar
- Fink, M. A. et al. Potential of ChatGPT and GPT-4 for data mining of free-text CT reports on lung cancer. Radiology 308, e231362 (2023).
Google Scholar
- Adams, L. C. et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology 307, e230725 (2023).
Google Scholar
- Sun, Z. et al. Evaluating GPT4 on impressions generation in radiology reports. Radiology 307, e231259 (2023).
Google Scholar
- Huang, J. et al. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA Netw. Open 6, e2336100 (2023).
Google Scholar
- Tanno, R. et al. Collaboration between clinicians and vision-language models in radiology report generation. Nat. Med 31, 599–608 (2025).
Google Scholar
- Dong, Z. et al. A deep learning-based system for real-time image reporting during esophagogastroduodenoscopy: a multicenter study. Endoscopy 54, 771–777 (2022).
Google Scholar
- Zhang, L. et al. Effect of a deep learning-based automatic upper GI endoscopic reporting system: a randomized crossover study (with video). Gastrointest. Endosc. 98, 181–190.e10 (2023).
Google Scholar
- Lahat, A. et al. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci. Rep. 13, 4164 (2023).
Google Scholar
- Savage, T., Wang, J. & Shieh, L. A large language model screening tool to target patients for best practice alerts: development and validation. JMIR Med Inf. 11, e49886 (2023).
Google Scholar
- Kim, H. J., Gong, E. J. & Bang, C. S. Application of machine learning based on structured medical data in gastroenterology. Biomim. (Basel) 8, 512 (2023).
Google Scholar
- Wang S. et al. Leveraging large language and vision models for knowledge extraction from large-scale image-text colonoscopy records. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01500-x (2025).
- Carlini L. et al. Large language models for detecting colorectal polyps in endoscopic images. Gut. https://doi.org/10.1136/gutjnl-2025-335091(2025).
- Boers, T. G. W. et al. Foundation models in gastrointestinal endoscopic AI: impact of architecture, pre-training approach and data efficiency. Med Image Anal. 98, 103298 (2024).
Google Scholar
- Aabakken, L. et al. Minimal standard terminology for gastrointestinal endoscopy - MST 3.0. Endoscopy 41, 727–728 (2009).
Google Scholar
- Nagula, S., Parasa, S., Laine, L. & Shah, S. C. AGA clinical practice update on high-quality upper endoscopy: expert review. Clin. Gastroenterol. Hepatol. 22, 933–943 (2024).
Google Scholar
- Beg, S. et al. Quality standards in upper gastrointestinal endoscopy: a position statement of the British Society of Gastroenterology (BSG) and Association of Upper Gastrointestinal Surgeons of Great Britain and Ireland (AUGIS). Gut 66, 1886–1899 (2017).
Google Scholar
- Cohen, J. & Pike, I. M. Defining and measuring quality in endoscopy. Am. J. Gastroenterol. 110, 46–47 (2015).
Google Scholar
- Roorda, A. K. & Triadafilopoulos, G. A fellow’s guide to generating the endoscopy procedure report. Gastrointest. Endosc. 72, 803–805 (2010).
Google Scholar
- Yao, K. et al. Guidelines for endoscopic diagnosis of early gastric cancer. Dig. Endosc. 32, 663–698 (2020).
Google Scholar
- Wu, L. et al. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut 68, 2161–2169 (2019).
Google Scholar
- Wu, L. et al. A deep neural network improves endoscopic detection of early gastric cancer without blind spots. Endoscopy 51, 522–531 (2019).
Google Scholar
- Wang, H., Gao, C., Dantona, C., Hull, B. & Sun, J. DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med 7, 16 (2024).
Google Scholar
- He, M. et al. Efficient multimodal learning from data-centric perspective. ArXiv Preprint https://doi.org/10.48550/arXiv.2402.11530 (2024).
Download references
Acknowledgements
This work was supported by the National Key Research and Development Program of China (grant no. 2022YFC2505105, to Lianlian Wu, W.Z.); the Natural Science Foundation of Wuhan, (grant no. 2025040601020197, to Z.H.D.); the Hubei Provincial Key Laboratory Open Project, (grant no. 2024KFZ005, to Z.H.D.); the Key Research and Development Program of Hubei Province (grant no. 2023BCB153, to H.G.Y.); and the National Natural Science Foundation of China-Youth Science Fund (grant no. 82202257, to Lianlian Wu). The funders had no role in the study design, data collection, data analysis, interpretation, or manuscript preparation.
Author information
Author notes
- These authors contributed equally: Ruiqing Jiang, Boru Chen, Zehua Dong.
Authors and Affiliations
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, China
Ruiqing Jiang, Boru Chen, Zehua Dong, Xiaoquan Zeng, Hang You, Yanxia Li, Yunchao Deng, Ganggang Mu, Jing Wang, Li Huang, Jia Li, Du Cheng, Wei Zhou & Honggang Yu
- Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive Incision, Renmin Hospital of Wuhan University, Wuhan, China
Ruiqing Jiang, Boru Chen, Zehua Dong, Xiaoquan Zeng, Hang You, Yanxia Li, Yunchao Deng, Ganggang Mu, Jing Wang, Li Huang, Jia Li, Du Cheng, Wei Zhou & Honggang Yu
- Key Laboratory of Hubei Province for Digestive System Disease, Renmin Hospital of Wuhan University, Wuhan, China
Ruiqing Jiang, Boru Chen, Zehua Dong, Xiaoquan Zeng, Hang You, Yanxia Li, Yunchao Deng, Ganggang Mu, Jing Wang, Li Huang, Jia Li, Du Cheng, Wei Zhou & Honggang Yu
- Engineering Research Center for Artificial Intelligence Endoscopy Interventional Treatment of Hubei Province, Wuhan, China
Ruiqing Jiang, Boru Chen, Zehua Dong, Xiaoquan Zeng, Hang You, Yanxia Li, Yunchao Deng, Ganggang Mu, Jing Wang, Li Huang, Jia Li, Du Cheng, Wei Zhou & Honggang Yu
- Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan, China
Honggang Yu
Authors
- Ruiqing Jiang
- Boru Chen
- Zehua Dong
- Xiaoquan Zeng
- Hang You
- Yanxia Li
- Yunchao Deng
- Ganggang Mu
- Jing Wang
- Li Huang
- Jia Li
- Du Cheng
- Wei Zhou
- Honggang Yu
Contributions
Conceptualization: R.Q.J., B.R.C., Z.H.D. Methodology: R.Q.J., B.R.C., Z.H.D., X.Q.Z., H.Y. Investigation: R.Q.J., B.R.C., Z.H.D., X.Q.Z., H.Y., Y.X.L., Y.C.D., G.G.M,. J.W., L.H., J.L., D.C., W.Z. Visualization: R.Q.J., B.R.C., Z.H.D., X.Q.Z. Funding acquisition: H.G.Y., W.Z., Z.H.D.. Project administration: R.Q.J., B.R.C, Z.H.D., X.Q.Z. Supervision: H.G.Y., W.Z. Writing—original draft: R.Q.J., B.R.C., Z.H.D. Writing—review and editing: H.G.Y., W.Z., Z.H.D.
Corresponding authors
Correspondence to Wei Zhou or Honggang Yu.
Ethics declarations
Competing interests
Wuhan EndoAngel Co., Ltd. provided equipment for this study. The sponsor had no role in the design or conduct of the study; data collection, management, analysis, and interpretation; manuscript preparation; or the decision to submit the manuscript for publication. The other authors declare no competing financial or non-financial interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Jiang, R., Chen, B., Dong, Z. et al. Domain specific multimodal large language model for automated endoscopy reporting with multicenter prospective validation. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02569-7
Download citation
- Received: 11 December 2025
- Accepted: 11 March 2026
- Published: 28 March 2026
- DOI: https://doi.org/10.1038/s41746-026-02569-7
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelreportWhy I Used CBT Principles to Design an AI That Breaks Tasks Into Micro-Steps
Cognitive behavioral therapy and large language models might be the key to solving ADHD task paralysis. Most productivity software makes a core assumption: the user can look at a task and start working on it. Read All
Have an Unreasonably Specific Story About The Future
One of the problems with AI safety is that our goals are often quite distant from our day-to-day work. I want to reduce the chance of AI killing us all, but what I'm doing today is filling out a security form for the Australian government, reviewing some evaluation submissions, and editing this post. How does the latter get us to the former? [1] Thus, the topic of today’s article: Have an unreasonably specific story about the future. That is - you should be able to come up with at least one concrete scenario for how your work leads to the high-level good outcomes you want. By the conjunction fallacy, the story isn’t likely to work out exactly the way you envision. Hence the phrase unreasonably specific. You don’t want to predict the future here, which relies on a lot of hedging and broad t
New machine learning model offers blueprint for super-adsorbent biochar - EurekAlert!
<a href="https://news.google.com/rss/articles/CBMiXEFVX3lxTE5qNEdfZ1RmbEhubW8yNE5lX1pnWlhRUl9OMU1nQzZJdEZLY0dvNW14TUZkUlZPcUJEeFJReDhiRmFOMC1BZHRhRTYyVXJhWERMTjAxMVhiQ09jUFNo?oc=5" target="_blank">New machine learning model offers blueprint for super-adsorbent biochar</a> <font color="#6f6f6f">EurekAlert!</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Why I Used CBT Principles to Design an AI That Breaks Tasks Into Micro-Steps
Cognitive behavioral therapy and large language models might be the key to solving ADHD task paralysis. Most productivity software makes a core assumption: the user can look at a task and start working on it. Read All
OpenAI, Anthropic eye new AI safety solution - News.az
<a href="https://news.google.com/rss/articles/CBMickFVX3lxTFBYZHhJbkZVT1RsbWYtdEptemlxZ0tjbzBHVWlZcXJwQWdGTkZ3c0RITTQ0MzVaOEdGc3U3QnFBaUllMUllN1lsT0FWQnE4X0hxSWotU2Q3SjNRb1hEZHlXWVBjTTg4VmJyVnpuUzhjbVZGdw?oc=5" target="_blank">OpenAI, Anthropic eye new AI safety solution</a> <font color="#6f6f6f">News.az</font>
New machine learning model offers blueprint for super-adsorbent biochar - EurekAlert!
<a href="https://news.google.com/rss/articles/CBMiXEFVX3lxTE5qNEdfZ1RmbEhubW8yNE5lX1pnWlhRUl9OMU1nQzZJdEZLY0dvNW14TUZkUlZPcUJEeFJReDhiRmFOMC1BZHRhRTYyVXJhWERMTjAxMVhiQ09jUFNo?oc=5" target="_blank">New machine learning model offers blueprint for super-adsorbent biochar</a> <font color="#6f6f6f">EurekAlert!</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!