Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employeesThe DecoderPrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloudThe Register AI/MLThe Invisible Broken Clock in AI Video Generation - HackerNoonGNews AI video[D] Budget Machine Learning HardwareReddit r/MachineLearningAnthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demandThe DecoderDesktop Canary v2.1.48-canary.31LobeChat ReleasesQwen 3.5 397B vs Qwen 3.6-PlusReddit r/LocalLLaMAThe Invisible Broken Clock in AI Video GenerationHackernoon AIMean field sequence: an introductionLessWrong AISwift package AI inference engine generated from Rust crateHacker News AI TopZeta-2 Turns Code Edits Into Context-Aware Rewrite SuggestionsHackernoon AIAI Tools That Actually Pay You Back: A Developer's Guide to Monetizing AIDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employeesThe DecoderPrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloudThe Register AI/MLThe Invisible Broken Clock in AI Video Generation - HackerNoonGNews AI video[D] Budget Machine Learning HardwareReddit r/MachineLearningAnthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demandThe DecoderDesktop Canary v2.1.48-canary.31LobeChat ReleasesQwen 3.5 397B vs Qwen 3.6-PlusReddit r/LocalLLaMAThe Invisible Broken Clock in AI Video GenerationHackernoon AIMean field sequence: an introductionLessWrong AISwift package AI inference engine generated from Rust crateHacker News AI TopZeta-2 Turns Code Edits Into Context-Aware Rewrite SuggestionsHackernoon AIAI Tools That Actually Pay You Back: A Developer's Guide to Monetizing AIDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Automating Early Disease Prediction Via Structured and Unstructured Clinical Data

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28167v1 Announce Type: new Abstract: This study presents a fully automated methodology for early prediction studies in clinical settings, leveraging information extracted from unstructured discharge reports. The proposed pipeline uses discharge reports to support the three main steps of early prediction: cohort selection, dataset generation, and outcome labeling. By processing discharge reports with natural language processing techniques, we can efficiently identify relevant patient cohorts, enrich structured datasets with additional clinical variables, and generate high-quality lab — Ane G Domingo-Aldama, Marcos Merino Prado, Alain Garc\'ia Olea, Josu Goikoetxea, Koldo Gojenola, Aitziber Atutxa

View PDF HTML (experimental)

Abstract:This study presents a fully automated methodology for early prediction studies in clinical settings, leveraging information extracted from unstructured discharge reports. The proposed pipeline uses discharge reports to support the three main steps of early prediction: cohort selection, dataset generation, and outcome labeling. By processing discharge reports with natural language processing techniques, we can efficiently identify relevant patient cohorts, enrich structured datasets with additional clinical variables, and generate high-quality labels without manual intervention. This approach addresses the frequent issue of missing or incomplete data in codified electronic health records (EHR), capturing clinically relevant information that is often underrepresented. We evaluate the methodology in the context of predicting atrial fibrillation (AF) progression, showing that predictive models trained on datasets enriched with discharge report information achieve higher accuracy and correlation with true outcomes compared to models trained solely on structured EHR data, while also surpassing traditional clinical scores. These results demonstrate that automating the integration of unstructured clinical text can streamline early prediction studies, improve data quality, and enhance the reliability of predictive models for clinical decision-making.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.28167 [cs.LG]

(or arXiv:2603.28167v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28167

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ane G. Domingo-Aldama [view email] [v1] Mon, 30 Mar 2026 08:36:14 UTC (779 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Automating …researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 124 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers