Research Papers research paper arxiv nlp language-models

HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning

arXivMarch 31, 20261 min read0 views

arXiv:2511.15355v2 Announce Type: replace Abstract: We introduce HEAD-QA v2, an expanded and updated version of a Spanish/English healthcare multiple-choice reasoning dataset originally released by Vilares and G\'omez-Rodr\'iguez (2019). The update responds to the growing need for high-quality datasets that capture the linguistic and conceptual complexity of healthcare reasoning. We extend the dataset to over 12,000 questions from ten years of Spanish professional exams, benchmark several open-source LLMs using prompting, RAG, and probability-based answer selection, and provide additional mult — Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares

View PDF HTML (experimental)

Abstract:We introduce HEAD-QA v2, an expanded and updated version of a Spanish/English healthcare multiple-choice reasoning dataset originally released by Vilares and Gómez-Rodríguez (2019). The update responds to the growing need for high-quality datasets that capture the linguistic and conceptual complexity of healthcare reasoning. We extend the dataset to over 12,000 questions from ten years of Spanish professional exams, benchmark several open-source LLMs using prompting, RAG, and probability-based answer selection, and provide additional multilingual versions to support future work. Results indicate that performance is mainly driven by model scale and intrinsic reasoning ability, with complex inference strategies obtaining limited gains. Together, these results establish HEAD-QA v2 as a reliable resource for advancing research on biomedical reasoning and model improvement.

Comments: LREC 2026 camera-ready version

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2511.15355 [cs.CL]

(or arXiv:2511.15355v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2511.15355

arXiv-issued DOI via DataCite

Submission history

From: David Vilares [view email] [v1] Wed, 19 Nov 2025 11:31:32 UTC (705 KB) [v2] Mon, 30 Mar 2026 08:04:09 UTC (705 KB)

Original source

arXiv

https://arxiv.org/abs/2511.15355

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

CountriesLive

Arizona State University researcher warns against overtrusting AI in Iran strikes - AZ Family

<a href="https://news.google.com/rss/articles/CBMitgFBVV95cUxOUDl3cW9oVl9tNDJycVhfei1oazVOV3VtUTN0cTJVTVYyTm9EVVA5aF9QOUtCRHNOaktwb1lTbGxQS2xjdlZpUWhLNVRVQW9tQ2dsb3ZfQzZjZncwTE5ucF95bFQ1dmZfbkNYcHhuUk5JbXB4Wm0wbFRWbU15ZWFIbmdPZUVQQlNaR2VUeXhPQkpNT3QwaXpadmZMTnlQM0FVdWRSYTk0bjFoR1NrSWVNZ1ROZWJodw?oc=5" target="_blank">Arizona State University researcher warns against overtrusting AI in Iran strikes</a> AZ Family

Google News: AI

1mabout 1 hour ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPNFo5MVV0R29iUUFrY2xQZzhoNWJqTmlucXBEd1E4dXQwbmp1ZnA3cklGUGQtUkxwQTBpMXNKMUpkNEFiVmZMb0ZzU01LVmRNYUxUTXlVZ19oVmVzb3dOWFJrM2NlVlU2X0duTlkxT2lNaFdDTzg1Um1WQUxicmdPTDdkVkoyWkZfQ3NjRmFSZ3VvYlRjWF9IV3F2d0hpMmVYZmZUcVRoVTNtS3Z2VFlvSGhPLTgteHZyZ3pXRnZNbjI5UHlhWjJtT3NJS1BnVlRib3BVV3dJY04zYXR3TW1vem5mOUJuZXpDTHJMWWs5WERMdXozVWFVQUVCU2tPVGQ5M0tsSmlmblA0RG91c2RsUW1IUVJ1SEh6emtrb0ItZkNUWmJQLXp4cmNGdjdiOTVmNGdEVEZ2bUk2bXp1RFNSeFNieFc2MGl6aEJVblNyMXQ3UDlmSUdrVm1TZFNXZjJJZGozZHhvdk9mcTN5LTY3dEctaVNvaGFCX0lJbklxamRjX1VMVkpWbmVzVWVnd1NqUnVwZzhCQVJfOXhRNVA2RGtGNm8ycXp6OXRqUzNzMDNPX25zZUlmRg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1m1 day ago

Research PapersFresh

UVA researchers use AI to speed up drug development - WVIR

<a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxOaHF0M0pSdUx0OUp4UHd4a0VnVllZVWtFZ0F6U0I2azlPejJLLTduTmdtZWFCYWhLRWRQSjRXTkxaWlJiV1ozc1JERnFqemtLczJmOEh3d0luZTlNdFNNcDlRdjdobU50RDd0Tk5NRkdqSU5HbGo0RVEzSTdoVThFeWxhRHFzUWpaX3FF0gGfAUFVX3lxTE1jbWpYOWZEWGtJd25vRGg0Nll3VFRzNGdoT01YYmt4YVZ1RHV5dVB3TVN0UVdGVDNHbDFKZnBlODlyQkZSWFFjZ2NDRWVvS05kXzJPOVNpT0xtZ3g5UjM2MF8wWmhPdGkwU1hGYTJzOTlreTJjNzVlaFdHVm9mNUxjOXdQVmR3cVE1ZlhrRmpMbWZpU1FFUEx0UVZXVlBBTQ?oc=5" target="_blank">UVA researchers use AI to speed up drug development</a> WVIR

Google News: AI

1mabout 5 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 201 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

UVA researchers use AI to speed up drug development - WVIR

Google News: AI

1mabout 5 hours ago

Research PapersFresh

Illinois Tech computer science researcher honored by IEEE Chicago Section - EurekAlert!

<a href="https://news.google.com/rss/articles/CBMiXEFVX3lxTE13OVpWMEk1Z3hlMkR2bHNBQ2dkazFwb3VqN3hCa29GWGJvSVlPa00zd2xUakRmYXFqQmc5OWU0eGl4a21FMDAwWUN2Q3p0M3FrbXBkNV8zN0cxaG1s?oc=5" target="_blank">Illinois Tech computer science researcher honored by IEEE Chicago Section</a> EurekAlert!

Google News: Machine Learning

1mabout 6 hours ago

Research PapersFresh

Research roundup: 7 cool science stories we almost missed

Ars Technica

1mabout 7 hours ago

Research PapersFresh

AI maps science papers to predict research trends two to three years ahead - Tech Xplore

<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTE5aTkZYTWdaRDZwTXNRMldpMG1WZ1YzWDZTOHN5M183Z3A1ZTFYbnhEWTdPRmpvZnZFU0xodlRsNWxFaGxTcEpwalhJNmJpQWE5VjhaRS1tOXJIeTc5Z0JNblJ3dFd4WjRYZGJOX0NrWGt6ZmZJVTBpRm5wWQ?oc=5" target="_blank">AI maps science papers to predict research trends two to three years ahead</a> Tech Xplore

Google News: Machine Learning

1mabout 8 hours ago