Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPerplexity launches Secure Intelligence Institute to advance AI security, privacy, and safety research - Moneycontrol.comGoogle News: AI SafetyAnthropic Source Code Leak Exposes AI Security Logic Before $350B IPO - startupfortune.comGoogle News: ClaudeBoy, 16, takes his own life after chilling ChatGPT question and 'farewell' texts - Daily StarGoogle News: ChatGPTGiving up on EA after 13 yearsLessWrong AIThe End of the "I Am Not a Robot" Box: Why Your Next Login Will Require 5 SquatsDEV CommunityInstagram DMs to Amazon Connect ChatDEV CommunityThe Nines Are Lying to You: What 99.9% Uptime Actually CostsDEV CommunityThe jury verdicts against Meta and YouTube recognized some platform design features as defective, distinct from what Section 230 was created to protect (Casey Newton/Platformer)TechmemeAnthropic code leak sparks renewed concerns over AI security and operational risks - CXO DigitalpulseGoogle News: AI SafetyBefore You Upgrade Hardware, Fix the SoftwareDEV Community2026년, Postman 버릴 때? Axios npm 공격 후 안전한 API 테스트 및 마이그레이션DEV CommunityAnthropic accidentally leaks part of Claude Code source - Latest news from AzerbaijanGoogle News: ClaudeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPerplexity launches Secure Intelligence Institute to advance AI security, privacy, and safety research - Moneycontrol.comGoogle News: AI SafetyAnthropic Source Code Leak Exposes AI Security Logic Before $350B IPO - startupfortune.comGoogle News: ClaudeBoy, 16, takes his own life after chilling ChatGPT question and 'farewell' texts - Daily StarGoogle News: ChatGPTGiving up on EA after 13 yearsLessWrong AIThe End of the "I Am Not a Robot" Box: Why Your Next Login Will Require 5 SquatsDEV CommunityInstagram DMs to Amazon Connect ChatDEV CommunityThe Nines Are Lying to You: What 99.9% Uptime Actually CostsDEV CommunityThe jury verdicts against Meta and YouTube recognized some platform design features as defective, distinct from what Section 230 was created to protect (Casey Newton/Platformer)TechmemeAnthropic code leak sparks renewed concerns over AI security and operational risks - CXO DigitalpulseGoogle News: AI SafetyBefore You Upgrade Hardware, Fix the SoftwareDEV Community2026년, Postman 버릴 때? Axios npm 공격 후 안전한 API 테스트 및 마이그레이션DEV CommunityAnthropic accidentally leaks part of Claude Code source - Latest news from AzerbaijanGoogle News: Claude

Automating Clinical Information Retrieval from Finnish Electronic Health Records Using Large Language Models

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26434v1 Announce Type: new Abstract: Clinicians often need to retrieve patient-specific information from electronic health records (EHRs), a task that is time-consuming and error-prone. We present a locally deployable Clinical Contextual Question Answering (CCQA) framework that answers clinical questions directly from EHRs without external data transfer. Open-source large language models (LLMs) ranging from 4B to 70B parameters were benchmarked under fully offline conditions using 1,664 expert-annotated question-answer pairs derived from records of 183 patients. The dataset consiste — Mikko Saukkoriipi, Nicole Hernandez, Jaakko Sahlsten, Kimmo Kaski, Otso Arponen

View PDF HTML (experimental)

Abstract:Clinicians often need to retrieve patient-specific information from electronic health records (EHRs), a task that is time-consuming and error-prone. We present a locally deployable Clinical Contextual Question Answering (CCQA) framework that answers clinical questions directly from EHRs without external data transfer. Open-source large language models (LLMs) ranging from 4B to 70B parameters were benchmarked under fully offline conditions using 1,664 expert-annotated question-answer pairs derived from records of 183 patients. The dataset consisted predominantly of Finnish clinical text. In free-text generation, Llama-3.1-70B achieved 95.3% accuracy and 97.3% consistency across semantically equivalent question variants, while the smaller Qwen3-30B-A3B-2507 model achieved comparable performance. In a multiple-choice setting, models showed similar accuracy but variable calibration. Low-precision quantization (4-bit and 8-bit) preserved predictive performance while reducing GPU memory requirements and improving deployment feasibility. Clinical evaluation identified clinically significant errors in 2.9% of outputs, and semantically equivalent questions occasionally yielded discordant responses, including instances where one formulation was correct and the other contained a clinically significant error (0.96% of cases). These findings demonstrate that locally hosted open-source LLMs can accurately retrieve patient-specific information from EHRs using natural-language queries, while highlighting the need for validation and human oversight in clinical deployment.

Subjects:

Computation and Language (cs.CL)

ACM classes: I.2.7; I.5.4

Cite as: arXiv:2603.26434 [cs.CL]

(or arXiv:2603.26434v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26434

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mikko Saukkoriipi [view email] [v1] Fri, 27 Mar 2026 14:03:51 UTC (255 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Automating …researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 226 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers