Constructing Composite Features for Interpretable Music-Tagging
arXiv:2603.28644v1 Announce Type: cross Abstract: Combining multiple audio features can improve the performance of music tagging, but common deep learning-based feature fusion methods often lack interpretability. To address this problem, we propose a Genetic Programming (GP) pipeline that automatically evolves composite features by mathematically combining base music features, thereby capturing synergistic interactions while preserving interpretability. This approach provides representational benefits similar to deep feature fusion without sacrificing interpretability. Experiments on the MTG-J — Chenhao Xue, Weitao Hu, Joyraj Chakraborty, Zhijin Guo, Kang Li, Tianyu Shi, Martin Reed, Nikolaos Thomos
View PDF HTML (experimental)
Abstract:Combining multiple audio features can improve the performance of music tagging, but common deep learning-based feature fusion methods often lack interpretability. To address this problem, we propose a Genetic Programming (GP) pipeline that automatically evolves composite features by mathematically combining base music features, thereby capturing synergistic interactions while preserving interpretability. This approach provides representational benefits similar to deep feature fusion without sacrificing interpretability. Experiments on the MTG-Jamendo and GTZAN datasets demonstrate consistent improvements compared to state-of-the-art systems across base feature sets at different abstraction levels. It should be noted that most of the performance gains are noticed within the first few hundred GP evaluations, indicating that effective feature combinations can be identified under modest search budgets. The top evolved expressions include linear, nonlinear, and conditional forms, with various low-complexity solutions at top performance aligned with parsimony pressure to prefer simpler expressions. Analyzing these composite features further reveals which interactions and transformations tend to be beneficial for tagging, offering insights that remain opaque in black-box deep models.
Comments: 5 pages, 8 figures, accepted at ICASSP 2026
Subjects:
Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as: arXiv:2603.28644 [cs.SD]
(or arXiv:2603.28644v1 [cs.SD] for this version)
https://doi.org/10.48550/arXiv.2603.28644
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Chenhao Xue [view email] [v1] Mon, 30 Mar 2026 16:25:58 UTC (4,896 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Brain-inspired chip could make some AI tasks up to 2,000 times more energy efficient
A new type of computer chip that uses the physics of materials to process information could make some artificial intelligence (AI) systems far more energy efficient, researchers have found. Loughborough University physicists have developed a device that can process data that changes over time directly in hardware, rather than relying on software running on conventional computers.


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!