Can pre-trained Deep Learning models predict groove ratings?
arXiv:2603.27237v1 Announce Type: cross Abstract: This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-se — Axel Marmoret, Nicolas Farrugia, Jan Alexander Stupacher
View PDF HTML (experimental)
Abstract:This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-separated instruments, thereby isolating the contributions of individual musical elements. Our analysis reveals a clear separation of groove characteristics driven by the underlying musical style of the tracks (funk, pop, and rock). These findings indicate that deep audio representations can successfully encode complex, style-dependent groove components that traditional features often miss. Ultimately, this work highlights the capacity of advanced deep learning models to capture the multifaceted concept of groove, demonstrating the strong potential of representation learning to advance predictive Music Information Retrieval methodologies.
Comments: Submitted to the SMC 2026 conference. 3 figures and 2 tables
Subjects:
Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
ACM classes: H.5.5
Cite as: arXiv:2603.27237 [cs.SD]
(or arXiv:2603.27237v1 [cs.SD] for this version)
https://doi.org/10.48550/arXiv.2603.27237
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Axel Marmoret [view email] [v1] Sat, 28 Mar 2026 11:20:38 UTC (501 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivMicrosoft Stock Lands Overdue Lift on Asia AI Growth Plans - Schaeffer's Investment Research
<a href="https://news.google.com/rss/articles/CBMiuAFBVV95cUxOZzd2cUQ4cGU3M3p3LTZfMTZteUx5S01sX0dBemhjSVA0OUk2dnpHZGZjWWRpcDJ6dzNvaUctZHJhSHdEaWdrUTdGQjhwYmdlWlJwdHBDTUJNX01fQUtDVnh3N29CdjFvZFhvWkhIR3FwTjJWd1FNQmgtTHE0MGVVMThVX3V3WWdfUGlGdTNUMFpHQ01xc2YyXzd6ajBadURQeGVYdzhDbm1HYllnTmxmU0NuTjVSTXBt?oc=5" target="_blank">Microsoft Stock Lands Overdue Lift on Asia AI Growth Plans</a> <font color="#6f6f6f">Schaeffer's Investment Research</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxNZUVvVzNvMVZBN09Hc3c5QWoxMUN4MEo1dDIzQjFXd1pkUlVWT25ZQ3pjOS1RMGdONzNrLUpfMHdLYTZUVm5YQkZXSDZxLU5qSkpCR0pYTTRURWtOR2JhMWMtMWVPX1dIRmlwN0lGTkFtVlVaRHdIeEFabFBTQU9FeWFiY3B1NERUNTc1X2N1djhGUk0yYVdUUjFzdGd0eFVuWVZ1TzN1akZCQmtuS3RzNWg3YVNraTFWd0ozX2hISTlUbElnQ29vUUx0WThlUVppU3E5LTNTSllDcXV0dUJUU29mWkVWZDV2OEtRbC1mRWYtWGZQOXUxZ1ppS1B6dThsSXM3V3lfNUxseERsUFVISW1HUXNUSVRhbUNLZ09sSjhDRnNGZXQtdGVsbGZCZXppRUdsdUhrWERta1V5Vm5GeWR6V1pvWE91TUFqQ0tfdU04TkpNTnIwSnd6RGNLWUo2ZVc1dHlzV1lWbi0xdWYtWXJCZ3RwTkdmS1R3Sk5RTl9iTFpTNXVFQktzYWZqYmNRZUpJMHBnRkNUb0tWMG1fV2JzckVhS1Zla0hoOE9Ra2hkdmZFT2lXWA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
arXiv:2604.00443v1 Announce Type: new Abstract: If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense di
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes
arXiv:2604.00385v1 Announce Type: new Abstract: Type 1 Diabetes (T1D) management requires continuous adjustment of insulin and lifestyle behaviors to maintain blood glucose within a safe target range. Although automated insulin delivery (AID) systems have improved glycemic outcomes, many patients still fail to achieve recommended clinical targets, warranting new approaches to improve glucose control in patients with T1D. While reinforcement learning (RL) has been utilized as a promising approach, current RL-based methods focus primarily on insulin-only treatment and do not provide behavioral recommendations for glucose control. To address this gap, we propose GUIDE, an RL-based decision-support framework designed to complement AID technologies by providing behavioral recommendations to pre

Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models
arXiv:2604.00890v1 Announce Type: new Abstract: Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language models because it requires the combination of diagrammatic understanding, symbolic manipulation and logical inference. In existing literature, researchers have chiefly focused on synchronising the diagram descriptions with text literals and solving the problem. In this vein, they have either taken a neural, symbolic or neuro-symbolic approach. But this solves only the first two of the requirements, namely diagrammatic understanding and symbolic manipulation, while leaving logical inference underdeveloped. The logical inference is often limited to one chain-of-thought (CoT). To address this weakness in hitherto existing models, this paper

Google research suggests encryption technique used by Bitcoin will be cracked by quantum computers around 2029 — search giant says quantum attacks need to be prepared for now
Google research suggests encryption technique used by Bitcoin will be cracked by quantum computers around 2029 — search giant says quantum attacks need to be prepared for now

ARGS: Auto-Regressive Gaussian Splatting via Parallel Progressive Next-Scale Prediction
arXiv:2604.00494v1 Announce Type: new Abstract: Auto-regressive frameworks for next-scale prediction of 2D images have demonstrated strong potential for producing diverse and sophisticated content by progressively refining a coarse input. However, extending this paradigm to 3D object generation remains largely unexplored. In this paper, we introduce auto-regressive Gaussian splatting (ARGS), a framework for making next-scale predictions in parallel for generation according to levels of detail. We propose a Gaussian simplification strategy and reverse the simplification to guide next-scale generation. Benefiting from the use of hierarchical trees, the generation process requires only \(\mathcal{O}(\log n)\) steps, where \(n\) is the number of points. Furthermore, we propose a tree-based tra

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!