HumMusQA: A Human-written Music Understanding QA Benchmark Dataset
arXiv:2603.27877v1 Announce Type: new Abstract: The evaluation of music understanding in Large Audio-Language Models (LALMs) requires a rigorously defined benchmark that truly tests whether models can perceive and interpret music, a standard that current data methodologies frequently fail to meet. This paper introduces a meticulously structured approach to music evaluation, proposing a new dataset of 320 hand-written questions curated and validated by experts with musical training, arguing that such focused, manual curation is superior for probing complex audio comprehension. To demonstrate th — Benno Weck, Pablo Puentes, Andrea Poltronieri, Satyajeet Prabhu, Dmitry Bogdanov
View PDF HTML (experimental)
Abstract:The evaluation of music understanding in Large Audio-Language Models (LALMs) requires a rigorously defined benchmark that truly tests whether models can perceive and interpret music, a standard that current data methodologies frequently fail to meet. This paper introduces a meticulously structured approach to music evaluation, proposing a new dataset of 320 hand-written questions curated and validated by experts with musical training, arguing that such focused, manual curation is superior for probing complex audio comprehension. To demonstrate the use of the dataset, we benchmark six state-of-the-art LALMs and additionally test their robustness to uni-modal shortcuts.
Comments: Dataset available at this https URL
Subjects:
Computation and Language (cs.CL); Sound (cs.SD)
Cite as: arXiv:2603.27877 [cs.CL]
(or arXiv:2603.27877v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.27877
arXiv-issued DOI via DataCite (pending registration)
Journal reference: Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026), pages 58-67, Rabat, Morocco. Association for Computational Linguistics
Related DOI:
https://doi.org/10.18653/v1/2026.nlp4musa-1.9
DOI(s) linking to related resources
Submission history
From: Benno Weck [view email] [v1] Sun, 29 Mar 2026 21:33:07 UTC (815 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Alibaba Poaches Google DeepMind Research Scientist For Qwen AI Push - Yahoo Finance
<a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxOYTZwZk0walRzazJQampab1FCM2k4Uy1SYk12UWZraENkUXYzZU9kbnlGTGZJS0pFaTZIUFlKZFkwVnJkRzhKbXhNV3lNdUZpdF8tSU1LMklqcTZlUDZERDZ3VzdWbjNQYUN4T2d2ZkRQT1R1MUc0LXdYNndPQTNzbXBXMXJhb3ZEZE00ZFMtaw?oc=5" target="_blank">Alibaba Poaches Google DeepMind Research Scientist For Qwen AI Push</a> <font color="#6f6f6f">Yahoo Finance</font>





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!