Learning Diagnostic Reasoning for Decision Support in Toxicology
arXiv:2603.29608v1 Announce Type: new Abstract: Acute poly-substance intoxication requires rapid, life-saving decisions under substantial uncertainty, as clinicians must rely on incomplete ingestion details and nonspecific symptoms. Effective diagnostic reasoning in this chaotic environment requires fusing unstructured, non-medical narratives (e.g. paramedic scene descriptions and unreliable patient self-reports or known histories), with structured medical data like vital signs. While Large Language Models (LLMs) show potential for processing such heterogeneous inputs, they struggle in this setting, often underperforming simple baselines that rely solely on patient histories. To address this, we present DeToxR (Decision-support for Toxicology with Reasoning), the first adaptation of Reinfo
View PDF HTML (experimental)
Abstract:Acute poly-substance intoxication requires rapid, life-saving decisions under substantial uncertainty, as clinicians must rely on incomplete ingestion details and nonspecific symptoms. Effective diagnostic reasoning in this chaotic environment requires fusing unstructured, non-medical narratives (e.g. paramedic scene descriptions and unreliable patient self-reports or known histories), with structured medical data like vital signs. While Large Language Models (LLMs) show potential for processing such heterogeneous inputs, they struggle in this setting, often underperforming simple baselines that rely solely on patient histories. To address this, we present DeToxR (Decision-support for Toxicology with Reasoning), the first adaptation of Reinforcement Learning (RL) to emergency toxicology. We design a robust data-fusion engine for multi-label prediction across 14 substance classes based on an LLM finetuned with Group Relative Policy Optimization (GRPO). We optimize the model's reasoning directly using a clinical performance reward. By formulating a multi-label agreement metric as the reward signal, the model is explicitly penalized for missing co-ingested substances and hallucinating absent poisons. Our model significantly outperforms its unadapted base LLM counterpart and supervised baselines. Furthermore, in a clinical validation study, the model indicates a clinical advantage by outperforming an expert toxicologist in identifying the correct poisons (Micro-F1: 0.644 vs. 0.473). These results demonstrate the potential of RL-aligned LLMs to synthesize unstructured pre-clinical narratives and structured medical data for decision support in high-stakes environments.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.29608 [cs.CL]
(or arXiv:2603.29608v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.29608
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: David Bani-Harouni [view email] [v1] Tue, 31 Mar 2026 11:26:45 UTC (308 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelannounceCognichip, which is building an AI model for chip design, raised a $60M Series A led by Seligman Ventures, with participation from new board member Lip-Bu Tan (Tim Fernholz/TechCrunch)
Tim Fernholz / TechCrunch : Cognichip, which is building an AI model for chip design, raised a $60M Series A led by Seligman Ventures, with participation from new board member Lip-Bu Tan — The most advanced silicon chips have accelerated the development of artificial intelligence. Now, can AI return the favor?
AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted - wired.com
<a href="https://news.google.com/rss/articles/CBMijAFBVV95cUxOSWM1R1Y2THUxVzRaX2E1ZHBkekdrSGktcG0tbFFzV3k4emJXUWpDVkpJMWhKM1g4VXB2WktnWWl4dWQwSWhVQTF1ZzFMVlhJdnluTks5UzNEeXh5bWZsVUIyYktJMnUwNC14LTJ3TDZnRXNDS0FPelEwNWtHSFFpQ0xqd2dfNU45Zi1fag?oc=5" target="_blank">AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted</a> <font color="#6f6f6f">wired.com</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Cognichip, which is building an AI model for chip design, raised a $60M Series A led by Seligman Ventures, with participation from new board member Lip-Bu Tan (Tim Fernholz/TechCrunch)
Tim Fernholz / TechCrunch : Cognichip, which is building an AI model for chip design, raised a $60M Series A led by Seligman Ventures, with participation from new board member Lip-Bu Tan — The most advanced silicon chips have accelerated the development of artificial intelligence. Now, can AI return the favor?
AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted - wired.com
<a href="https://news.google.com/rss/articles/CBMijAFBVV95cUxOSWM1R1Y2THUxVzRaX2E1ZHBkekdrSGktcG0tbFFzV3k4emJXUWpDVkpJMWhKM1g4VXB2WktnWWl4dWQwSWhVQTF1ZzFMVlhJdnluTks5UzNEeXh5bWZsVUIyYktJMnUwNC14LTJ3TDZnRXNDS0FPelEwNWtHSFFpQ0xqd2dfNU45Zi1fag?oc=5" target="_blank">AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted</a> <font color="#6f6f6f">wired.com</font>
Tagged: claude ai - Crowdfund Insider
<a href="https://news.google.com/rss/articles/CBMiW0FVX3lxTFBMd3FCREZHVTY2ZWVyQUVILU1CQ0VZNG43MExCbjdiNGV5WW9XaTlqR2txUlJRN2dkeTJ1ZDE0bnZsNW5GVmlYS01tWWhwSzFhVkkzT0dRWWthbGc?oc=5" target="_blank">Tagged: claude ai</a> <font color="#6f6f6f">Crowdfund Insider</font>
Anthropic Executive Blames Claude Code Leak on ‘Process Errors’ - Bloomberg.com
<a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxPNEZxSzM5eEUza21meWE4dndaM2N2QWhhTFFXdmdLeE84bzNyaU1lcy1QLU43RlZ3VWxtbElzcDJkVWJDV25PV0ZIRWNzbG5hazlLeEZpWm9VZXFEQkxaQ0N0LTdLYm9HcW13STZIa3FadU1HQTVHSkFKX25UOU1JQUhqY2pneGhNcE5PMDZiT3JVWG9kOGJjYUJXNE10REJLQ0EtalYxT05DTElMOC1BcHpR?oc=5" target="_blank">Anthropic Executive Blames Claude Code Leak on ‘Process Errors’</a> <font color="#6f6f6f">Bloomberg.com</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!