HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
arXiv:2511.15355v2 Announce Type: replace Abstract: We introduce HEAD-QA v2, an expanded and updated version of a Spanish/English healthcare multiple-choice reasoning dataset originally released by Vilares and G\'omez-Rodr\'iguez (2019). The update responds to the growing need for high-quality datasets that capture the linguistic and conceptual complexity of healthcare reasoning. We extend the dataset to over 12,000 questions from ten years of Spanish professional exams, benchmark several open-source LLMs using prompting, RAG, and probability-based answer selection, and provide additional mult — Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares
View PDF HTML (experimental)
Abstract:We introduce HEAD-QA v2, an expanded and updated version of a Spanish/English healthcare multiple-choice reasoning dataset originally released by Vilares and Gómez-Rodríguez (2019). The update responds to the growing need for high-quality datasets that capture the linguistic and conceptual complexity of healthcare reasoning. We extend the dataset to over 12,000 questions from ten years of Spanish professional exams, benchmark several open-source LLMs using prompting, RAG, and probability-based answer selection, and provide additional multilingual versions to support future work. Results indicate that performance is mainly driven by model scale and intrinsic reasoning ability, with complex inference strategies obtaining limited gains. Together, these results establish HEAD-QA v2 as a reliable resource for advancing research on biomedical reasoning and model improvement.
Comments: LREC 2026 camera-ready version
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2511.15355 [cs.CL]
(or arXiv:2511.15355v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2511.15355
arXiv-issued DOI via DataCite
Submission history
From: David Vilares [view email] [v1] Wed, 19 Nov 2025 11:31:32 UTC (705 KB) [v2] Mon, 30 Mar 2026 08:04:09 UTC (705 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivArizona State University researcher warns against overtrusting AI in Iran strikes - AZ Family
<a href="https://news.google.com/rss/articles/CBMitgFBVV95cUxOUDl3cW9oVl9tNDJycVhfei1oazVOV3VtUTN0cTJVTVYyTm9EVVA5aF9QOUtCRHNOaktwb1lTbGxQS2xjdlZpUWhLNVRVQW9tQ2dsb3ZfQzZjZncwTE5ucF95bFQ1dmZfbkNYcHhuUk5JbXB4Wm0wbFRWbU15ZWFIbmdPZUVQQlNaR2VUeXhPQkpNT3QwaXpadmZMTnlQM0FVdWRSYTk0bjFoR1NrSWVNZ1ROZWJodw?oc=5" target="_blank">Arizona State University researcher warns against overtrusting AI in Iran strikes</a> <font color="#6f6f6f">AZ Family</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPNFo5MVV0R29iUUFrY2xQZzhoNWJqTmlucXBEd1E4dXQwbmp1ZnA3cklGUGQtUkxwQTBpMXNKMUpkNEFiVmZMb0ZzU01LVmRNYUxUTXlVZ19oVmVzb3dOWFJrM2NlVlU2X0duTlkxT2lNaFdDTzg1Um1WQUxicmdPTDdkVkoyWkZfQ3NjRmFSZ3VvYlRjWF9IV3F2d0hpMmVYZmZUcVRoVTNtS3Z2VFlvSGhPLTgteHZyZ3pXRnZNbjI5UHlhWjJtT3NJS1BnVlRib3BVV3dJY04zYXR3TW1vem5mOUJuZXpDTHJMWWs5WERMdXozVWFVQUVCU2tPVGQ5M0tsSmlmblA0RG91c2RsUW1IUVJ1SEh6emtrb0ItZkNUWmJQLXp4cmNGdjdiOTVmNGdEVEZ2bUk2bXp1RFNSeFNieFc2MGl6aEJVblNyMXQ3UDlmSUdrVm1TZFNXZjJJZGozZHhvdk9mcTN5LTY3dEctaVNvaGFCX0lJbklxamRjX1VMVkpWbmVzVWVnd1NqUnVwZzhCQVJfOXhRNVA2RGtGNm8ycXp6OXRqUzNzMDNPX25zZUlmRg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
UVA researchers use AI to speed up drug development - WVIR
<a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxOaHF0M0pSdUx0OUp4UHd4a0VnVllZVWtFZ0F6U0I2azlPejJLLTduTmdtZWFCYWhLRWRQSjRXTkxaWlJiV1ozc1JERnFqemtLczJmOEh3d0luZTlNdFNNcDlRdjdobU50RDd0Tk5NRkdqSU5HbGo0RVEzSTdoVThFeWxhRHFzUWpaX3FF0gGfAUFVX3lxTE1jbWpYOWZEWGtJd25vRGg0Nll3VFRzNGdoT01YYmt4YVZ1RHV5dVB3TVN0UVdGVDNHbDFKZnBlODlyQkZSWFFjZ2NDRWVvS05kXzJPOVNpT0xtZ3g5UjM2MF8wWmhPdGkwU1hGYTJzOTlreTJjNzVlaFdHVm9mNUxjOXdQVmR3cVE1ZlhrRmpMbWZpU1FFUEx0UVZXVlBBTQ?oc=5" target="_blank">UVA researchers use AI to speed up drug development</a> <font color="#6f6f6f">WVIR</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
UVA researchers use AI to speed up drug development - WVIR
<a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxOaHF0M0pSdUx0OUp4UHd4a0VnVllZVWtFZ0F6U0I2azlPejJLLTduTmdtZWFCYWhLRWRQSjRXTkxaWlJiV1ozc1JERnFqemtLczJmOEh3d0luZTlNdFNNcDlRdjdobU50RDd0Tk5NRkdqSU5HbGo0RVEzSTdoVThFeWxhRHFzUWpaX3FF0gGfAUFVX3lxTE1jbWpYOWZEWGtJd25vRGg0Nll3VFRzNGdoT01YYmt4YVZ1RHV5dVB3TVN0UVdGVDNHbDFKZnBlODlyQkZSWFFjZ2NDRWVvS05kXzJPOVNpT0xtZ3g5UjM2MF8wWmhPdGkwU1hGYTJzOTlreTJjNzVlaFdHVm9mNUxjOXdQVmR3cVE1ZlhrRmpMbWZpU1FFUEx0UVZXVlBBTQ?oc=5" target="_blank">UVA researchers use AI to speed up drug development</a> <font color="#6f6f6f">WVIR</font>
Illinois Tech computer science researcher honored by IEEE Chicago Section - EurekAlert!
<a href="https://news.google.com/rss/articles/CBMiXEFVX3lxTE13OVpWMEk1Z3hlMkR2bHNBQ2dkazFwb3VqN3hCa29GWGJvSVlPa00zd2xUakRmYXFqQmc5OWU0eGl4a21FMDAwWUN2Q3p0M3FrbXBkNV8zN0cxaG1s?oc=5" target="_blank">Illinois Tech computer science researcher honored by IEEE Chicago Section</a> <font color="#6f6f6f">EurekAlert!</font>
AI maps science papers to predict research trends two to three years ahead - Tech Xplore
<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTE5aTkZYTWdaRDZwTXNRMldpMG1WZ1YzWDZTOHN5M183Z3A1ZTFYbnhEWTdPRmpvZnZFU0xodlRsNWxFaGxTcEpwalhJNmJpQWE5VjhaRS1tOXJIeTc5Z0JNblJ3dFd4WjRYZGJOX0NrWGt6ZmZJVTBpRm5wWQ?oc=5" target="_blank">AI maps science papers to predict research trends two to three years ahead</a> <font color="#6f6f6f">Tech Xplore</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!