Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessSam Altman's Sister Amends Lawsuit Accusing OpenAI CEO of Sexual Abuse - GV WireGoogle News: OpenAI‘System failure’ paralyzes Baidu robotaxis in ChinaTechCrunch AIThe Perils of AI-Generated Legal Advice for Dealers and Finance Companies - JD SupraGoogle News: Generative AICrack ML Interviews with Confidence: Anomaly Detection (20 Q&A)Towards AIMicrosoft CFO’s AI Spending Runs Up Against Tech Bubble FearsBloomberg TechnologyHow We Built an EdTech Platform That Scaled to 250K Daily UsersDEV CommunityClaude Code leak puts Anthropic on the other side of the copyright battleBusiness InsiderBuilding Trust in Generative AI Together: Cisco’s Role in the NIST GenAI Program - Cisco BlogsGoogle News: Generative AIAnthropic Gets a Taste of Its Own Medicine - businessinsider.comGoogle News: ClaudeRoguelike Devlog: Redesigning a Game UI With an AI 2D Game MakerDEV CommunityI spent days debugging a cron job that was "working fine"DEV CommunityLLM Agents Need a Nervous System, Not Just a BrainDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessSam Altman's Sister Amends Lawsuit Accusing OpenAI CEO of Sexual Abuse - GV WireGoogle News: OpenAI‘System failure’ paralyzes Baidu robotaxis in ChinaTechCrunch AIThe Perils of AI-Generated Legal Advice for Dealers and Finance Companies - JD SupraGoogle News: Generative AICrack ML Interviews with Confidence: Anomaly Detection (20 Q&A)Towards AIMicrosoft CFO’s AI Spending Runs Up Against Tech Bubble FearsBloomberg TechnologyHow We Built an EdTech Platform That Scaled to 250K Daily UsersDEV CommunityClaude Code leak puts Anthropic on the other side of the copyright battleBusiness InsiderBuilding Trust in Generative AI Together: Cisco’s Role in the NIST GenAI Program - Cisco BlogsGoogle News: Generative AIAnthropic Gets a Taste of Its Own Medicine - businessinsider.comGoogle News: ClaudeRoguelike Devlog: Redesigning a Game UI With an AI 2D Game MakerDEV CommunityI spent days debugging a cron job that was "working fine"DEV CommunityLLM Agents Need a Nervous System, Not Just a BrainDEV Community

LLM-Assisted Emergency Triage Benchmark: Bridging Hospital-Rich and MCI-Like Field Simulation

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2509.26351v2 Announce Type: replace Abstract: Research on emergency and mass casualty incident (MCI) triage has been limited by the absence of openly usable, reproducible benchmarks. Yet these scenarios demand rapid identification of the patients most in need, where accurate deterioration prediction can guide timely interventions. While the MIMIC-IV-ED database is openly available to credentialed researchers, transforming it into a triage-focused benchmark requires extensive preprocessing, feature harmonization, and schema alignment -- barriers that restrict accessibility to only highly — Joshua Sebastian, Karma Tobden, KMA Solaiman

View PDF HTML (experimental)

Abstract:Research on emergency and mass casualty incident (MCI) triage has been limited by the absence of openly usable, reproducible benchmarks. Yet these scenarios demand rapid identification of the patients most in need, where accurate deterioration prediction can guide timely interventions. While the MIMIC-IV-ED database is openly available to credentialed researchers, transforming it into a triage-focused benchmark requires extensive preprocessing, feature harmonization, and schema alignment -- barriers that restrict accessibility to only highly technical users. We address these gaps by first introducing an open, LLM-assisted emergency triage benchmark for deterioration prediction (ICU transfer, in-hospital mortality). The benchmark then defines two regimes: (i) a hospital-rich setting with vitals, labs, notes, chief complaints, and structured observations, and (ii) an MCI-like field simulation limited to vitals, observations, and notes. Large language models (LLMs) contributed directly to dataset construction by (i) harmonizing noisy fields such as AVPU and breathing devices, (ii) prioritizing clinically relevant vitals and labs, and (iii) guiding schema alignment and efficient merging of disparate tables. We further provide baseline models and SHAP-based interpretability analyses, illustrating predictive gaps between regimes and the features most critical for triage. Together, these contributions make triage prediction research more reproducible and accessible -- a step toward dataset democratization in clinical AI.

Comments: Submitted to GenAI4Health@NeurIPS 2025. This was the first version of the LLM-assisted emergency triage benchmark dataset and baseline models. A related but separate benchmark-focused study on emergency triage under constrained sensing has been accepted at the IEEE International Conference on Healthcare Informatics (ICHI) 2026 (see arXiv:2602.20168)

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2509.26351 [cs.LG]

(or arXiv:2509.26351v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2509.26351

arXiv-issued DOI via DataCite

Submission history

From: Kma Solaiman [view email] [v1] Tue, 30 Sep 2025 14:54:58 UTC (398 KB) [v2] Mon, 30 Mar 2026 10:47:55 UTC (398 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
LLM-Assiste…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 202 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers