Integrating large language models for enhanced predictive analytics in healthcare
npj Digital Medicine, Published online: 02 April 2026; doi:10.1038/s41746-026-02572-y Integrating large language models for enhanced predictive analytics in healthcare
References
- Woolf, S. H. et al. Promoting informed choice: transforming health care to dispense knowledge for decision making (2005).
- Kaur, S. et al. Medical diagnostic systems using artificial intelligence (ai) algorithms: principles and perspectives. IEEE Access 8, 228049–228069 (2020).
Google Scholar
- Graber, M. L. The incidence of diagnostic error in medicine. BMJ Qual. Saf. 22, ii21–ii27 (2013).
Google Scholar
- Stern, S. D. Symptom to Diagnosis an Evidence-Based Guide (McGraw-Hill Education, 2010).
- Achour, S. L., Dojat, M., Rieux, C., Bierling, P. & Lepage, E. A umls-based knowledge acquisition tool for rule-based clinical decision support system development. J. Am. Med. Inform. Assoc. 8, 351–360 (2001).
Google Scholar
- Papadopoulos, P., Soflano, M., Chaudy, Y., Adejo, W. & Connolly, T. M. A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Health Technol. 12, 713–727 (2022).
Google Scholar
- Riley, R. D. & Collins, G. S. Stability of clinical prediction models developed using statistical or machine learning methods. Biometrical J. 65, 2200302 (2023).
Google Scholar
- Eloranta, S. & Boman, M. Predictive models for clinical decision making: Deep dives in practical machine learning. J. Intern. Med. 292, 278–295 (2022).
Google Scholar
- Shouval, R. et al. Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in sct. Bone Marrow Transplant. 49, 332–337 (2014).
Google Scholar
- Zhong, Z. et al. Abn-blip: Abnormality-aligned bootstrapping language-image pre-training for pulmonary embolism diagnosis and report generation from ctpa. Med. Image Anal. 107, 103786 (2026).
Google Scholar
- Giesa, N. et al. Applying a transformer architecture to intraoperative temporal dynamics improves the prediction of postoperative delirium. Commun. Med. 4, 251 (2024).
Google Scholar
- Xu, Y., Xu, S., Ramprassad, M., Tumanov, A. & Zhang, C. Transehr: Self-supervised transformer for clinical time series data. In Machine Learning for Health (ML4H), 623–635 (PMLR, 2023).
- Oh, J., Wang, J. & Wiens, J. Learning to exploit invariances in clinical time-series data using sequence transformer networks. In Machine learning for healthcare conference, 332–347 (PMLR, 2018).
- Guo, H. et al. A multitask framework for automated interpretation of multi-frame right upper quadrant ultrasound in clinical decision support. arXiv preprint arXiv:2601.12174 (2026).
- Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171–4186 (2019).
- Lee, J. et al. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020).
Google Scholar
- Achiam, J. et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. et al. Improving language understanding by generative pre-training. arXiv preprint (2018).
- Huang, K., Altosaar, J. & Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342 (2019).
- Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).
Google Scholar
- Yang, X. et al. Gatortron: A large clinical language model to unlock patient information from unstructured electronic health records. arXiv preprint arXiv:2203.03540 (2022).
- Luo, R. et al. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinforma. 23, bbac409 (2022).
Google Scholar
- Chen, C. et al. Integration of large language models and federated learning. Patterns 5 (2024).
- Kokash, N. et al. Ontology-and llM-based data harmonization for federated learning in healthcare. arXiv preprint arXiv:2505.20020 (2025).
- Nascimento, L. et al. Federated large language models in healthcare: a systematic review, opportunities and challenges. Eng. Archive (2025).
- Nguyen, D.-T. et al. Federated learning for renal tumor segmentation and classification on multi-center mri dataset. J. Magn. Reson. Imaging 62, 814–824 (2025).
Google Scholar
- Floridi, L. & Chiriatti, M. Gpt-3: Its nature, scope, limits, and consequences. Minds Mach. 30, 681–694 (2020).
Google Scholar
- Pan, T., Shen, J. & Xu, M. Enhancing the performance of neurosurgery medical question-answering systems using a multi-task knowledge graph-augmented answer generation model. Front. Neurosci. 19, 1606038 (2025).
Google Scholar
- Xu, L. et al. End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, vol. 33, 7346–7353 (2019).
- Liu, W. et al. Meddg: A large-scale medical consultation dataset for building medical dialogue system. arXiv preprint (2020).
- Martino, A., Iannelli, M. & Truong, C. Knowledge injection to counter large language model (llm) hallucination. In European Semantic Web Conference, 182–185 (Springer, 2023).
- Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Google Scholar
- Touvron, H. et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).
Google Scholar
- Kirk, H. R., Vidgen, B., Röttger, P. & Hale, S. A. The benefits, risks and bounds of personalizing the alignment of large language models to individuals. Nat. Mach. Intell. 6, 383–392 (2024).
Google Scholar
- Sutton, N. R. et al. Coronary artery disease evaluation and management considerations for high risk occupations: commercial vehicle drivers and pilots. Circ.: Cardiovas. Interv. 14, e009950 (2021).
Google Scholar
- Righini, M. et al. The simplified pulmonary embolism severity index (pesi): validation of a clinical prognostic model for pulmonary embolism. J. Thrombosis Haemost. 9, 2115–2117 (2011).
Google Scholar
- Budoff, M. J. et al. Ten-year association of coronary artery calcium with atherosclerotic cardiovascular disease (ascvd) events: the multi-ethnic study of atherosclerosis (mesa). Eur. Heart J. 39, 2401–2408 (2018).
Google Scholar
- Guo, D. et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025).
- Team, G. et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786 (2025).
- Tu, T. et al. Towards generalist biomedical AI. Nejm AI 1, AIoa2300138 (2024).
Google Scholar
- Toma, A. et al. Clinical camel: An open expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031 (2023).
- Zhao, L. et al. Artificial intelligence-based lesion characterization and outcome prediction of prostate cancer on [18f] dcfpyl psma imaging. Radiotherapy Oncol. 111265 (2025).
- Wu, J., Roy, J. & Stewart, W. F. Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med. Care 48, S106–S113 (2010).
Google Scholar
- Bernstein, I. A. et al. Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions. JAMA Netw. Open 6, e2330320–e2330320 (2023).
Google Scholar
- Xu, F. et al. Are large language models really good logical reasoners? a comprehensive evaluation and beyond. IEEE Trans. Knowledge Data Eng. (2025).
- Wang, C. et al. Survey on factuality in large language models. ACM Comput. Surv. 58, 1–37 (2025).
Google Scholar
- Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).
Google Scholar
- Shamout, F., Zhu, T. & Clifton, D. A. Machine learning for clinical outcome prediction. IEEE Rev. Biomed. Eng. 14, 116–126 (2020).
Google Scholar
- Kim, J. I. et al. Machine learning for antimicrobial resistance prediction: current practice, limitations, and clinical perspective. Clin. Microbiol. Rev. 35, e00179–21 (2022).
Google Scholar
- Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380, 1347–1358 (2019).
Google Scholar
- Beam, A. L. & Kohane, I. S. Big data and machine learning in health care. Jama 319, 1317–1318 (2018).
Google Scholar
- Perez, E., Kiela, D. & Cho, K. True few-shot learning with language models. Adv. Neural Inf. Process. Syst. 34, 11054–11070 (2021).
Google Scholar
- Zhang, C., Morris, J. X. & Shmatikov, V. Extracting prompts by inverting llm outputs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 14753–14777 (2024).
- Huang, L. et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans. Inform. Syst. 43, 1–55 (2025).
Google Scholar
- Mahajan, A., Obermeyer, Z., Daneshjou, R., Lester, J. & Powell, D. Cognitive bias in clinical large language models. npj Digital Med. 8, 428 (2025).
Google Scholar
- Suenghataiphorn, T., Tribuddharat, N., Danpanichkul, P. & Kulthamrongsri, N. Bias in large language models across clinical applications: A systematic review. arXiv preprint arXiv:2504.02917 (2025).
- Hsu, W.-C. et al. Mri-based ovarian lesion classification via a foundation segmentation model and multimodal analysis: A multicenter study. Radiology 316, e243412 (2025).
Google Scholar
- Wu, J. et al. Vision-language foundation model for 3d medical imaging. npj Artif. Intell. 1, 17 (2025).
Google Scholar
- Zhong, Z. et al. Vision-language model for report generation and outcome prediction in ct pulmonary angiogram. NPJ Digital Med. 8, 432 (2025).
Google Scholar
- Huang, Z. et al. A pathologist–ai collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng. 9, 455–470 (2025).
Google Scholar
- Huang, X. et al. Understanding the planning of llm agents: A survey. arXiv preprint arXiv:2402.02716 (2024).
- Zhao, A. et al. Expel: Llm agents are experiential learners. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 19632–19642 (2024).
- Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J. & Fernández-Leal, Á Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev. 56, 3005–3054 (2023).
Google Scholar
- Cook, R. J., Zeng, L. & Yi, G. Y. Marginal analysis of incomplete longitudinal binary data: a cautionary note on locf imputation. Biometrics 60, 820–828 (2004).
Google Scholar
- Xue, H. & Salim, F. D. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Trans. Knowl. Data Eng. 36, 6851–6864 (2023).
Google Scholar
- Liu, H., Zhao, Z., Wang, J., Kamarthi, H., & Prakash, B. A. (2024, August). Lstprompt: Large language models as zero-shot time series forecasters by long-short-term prompting. In Findings of the Association for Computational Linguistics: ACL 2024, pp. 7832–7840.
- Moon, H. C., Joty, S. & Chi, X. Gradmask: Gradient-guided token masking for textual adversarial example detection. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3603–3613 (2022).
- Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. neural Inf. Process. Syst. 35, 24824–24837 (2022).
Google Scholar
- Dwivedi, A. K., Mallawaarachchi, I. & Alvarado, L. A. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method. Stat. Med. 36, 2187–2205 (2017).
Google Scholar
- Tong, X. et al. A novel subpixel phase correlation method using singular value decomposition and unified random sample consensus. IEEE Trans. Geosci. Remote Sens. 53, 4143–4156 (2015).
Google Scholar
- Naidu, K., Beenen, E., Gananadha, S. & Mosse, C. The yield of fever, inflammatory markers and ultrasound in the diagnosis of acute cholecystitis: a validation of the 2013 tokyo guidelines. World J. Surg. 40, 2892–2897 (2016).
Google Scholar
Download references
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelpublishedb8648
ggml-zendnn : add MUL_MAT_ID op support for MoE models ( #21315 ) ggml-zendnn : add MUL_MAT_ID op support for MoE models Add MUL_MAT_ID op acceleration for Mixture-of-Experts models MUL_MAT_ID op fallback to CPU backend if total experts > 32 Point ZenDNN lib to latest bits ZenDNN-2026-WW13 ggml-zendnn : add braces to sgemm failure condition for consistency Co-authored-by: Aaron Teo [email protected] Co-authored-by: Aaron Teo [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windo
![[P] I trained a Mamba-3 log anomaly detector that hit 0.9975 F1 on HDFS — and I’m curious how far this can go](https://preview.redd.it/3hrr4prgbzsg1.png?width=140&height=120&auto=webp&s=ad74d593f251847f8d8acb4e0fc71c0f5679f4bf)
[P] I trained a Mamba-3 log anomaly detector that hit 0.9975 F1 on HDFS — and I’m curious how far this can go
Experiment #324 ended well. ;) This time I built a small project around log anomaly detection. In about two days, I went from roughly 60% effectiveness in the first runs to a final F1 score of 0.9975 on the HDFS benchmark. Under my current preprocessing and evaluation setup, LogAI reaches F1=0.9975, which is slightly above the 0.996 HDFS result reported for LogRobust in a recent comparative study. What that means in practice: on 3,368 anomalous sessions in the test set, it missed about 9 (recall = 0.9973) on roughly 112k normal sessions, it raised only about 3 false alarms (precision = 0.9976) What I find especially interesting is that this is probably the first log anomaly detection model built on top of Mamba-3 / SSM, which was only published a few weeks ago. The model is small: 4.9M par

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark
Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression. System Specs Component Spec GPU NVIDIA GeForce RTX 5090 (32GB VRAM) CPU AMD Ryzen 9 9950X3D (16-core) RAM 64GB DDR5 OS Windows 11 Setup Model : gemma-4-31B-it-UD-Q4_K_XL from Unsloth (17.46 GiB) Build : TheTom/llama-cpp-turboquant branch feature/turboquant-kv-cache , merged with latest upstream master for Gemma 4 support KV Cache : turbo3 (3-bit PolarQuant + Hadamard rotation, ~4.5x compression vs f16) Config : --n-gpu-layers 99 --no-mmap --flash-attn on --cache-type-k turbo3 --cache-type-v turbo3 Benchmark Results Test Speed (t/s) pp4096 3,362.71 pp16384 3,047.00 pp65536 2,077.96 pp131072 1,428.80 pp262144 899.55 tg128 61.51 VRAM usage at 262K : 27.7 GB / 32 GB (4.3 GB headroom) G
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
b8648
ggml-zendnn : add MUL_MAT_ID op support for MoE models ( #21315 ) ggml-zendnn : add MUL_MAT_ID op support for MoE models Add MUL_MAT_ID op acceleration for Mixture-of-Experts models MUL_MAT_ID op fallback to CPU backend if total experts > 32 Point ZenDNN lib to latest bits ZenDNN-2026-WW13 ggml-zendnn : add braces to sgemm failure condition for consistency Co-authored-by: Aaron Teo [email protected] Co-authored-by: Aaron Teo [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windo




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!