Research Papers research paper arxiv ai artificial-intelligence

A Human-Inspired Decoupled Architecture for Efficient Audio Representation Learning

arXivby [Submitted on 27 Mar 2026]March 30, 20261 min read1 views

arXiv:2603.26098v1 Announce Type: cross Abstract: While self-supervised learning (SSL) has revolutionized audio representation, the excessive parameterization and quadratic computational cost of standard Transformers limit their deployment on resource-constrained devices. To address this bottleneck, we propose HEAR (Human-inspired Efficient Audio Representation), a novel decoupled architecture. Inspired by the human cognitive ability to isolate local acoustic features from global context, HEAR splits the processing pipeline into two dedicated modules: an Acoustic Model for local feature extrac — Harunori Kawano, Takeshi Sasaki

View PDF HTML (experimental)

Abstract:While self-supervised learning (SSL) has revolutionized audio representation, the excessive parameterization and quadratic computational cost of standard Transformers limit their deployment on resource-constrained devices. To address this bottleneck, we propose HEAR (Human-inspired Efficient Audio Representation), a novel decoupled architecture. Inspired by the human cognitive ability to isolate local acoustic features from global context, HEAR splits the processing pipeline into two dedicated modules: an Acoustic Model for local feature extraction and a Task Model for global semantic integration. Coupled with an Acoustic Tokenizer trained via knowledge distillation, our approach enables robust Masked Audio Modeling (MAM). Extensive experiments demonstrate that HEAR requires only 15M parameters and 9.47 GFLOPs for inference, operating at a fraction of the computational cost of conventional foundation models (which typically require 85M-94M parameters). Despite this high efficiency, HEAR achieves highly competitive performance across diverse audio classification benchmarks. The code and pre-trained models are available at this https URL

Subjects:

Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2603.26098 [cs.SD]

(or arXiv:2603.26098v1 [cs.SD] for this version)

https://doi.org/10.48550/arXiv.2603.26098

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Harunori Kawano [view email] [v1] Fri, 27 Mar 2026 06:09:03 UTC (557 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26098

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

arXiv:2604.02678v1 Announce Type: cross Abstract: Clinical evidence synthesis requires identifying relevant trials from large registries and aggregating results that account for population differences. While recent LLM-based approaches have automated components of systematic review, they do not support end-to-end evidence synthesis. Moreover, conventional meta-analysis weights studies by statistical precision without considering clinical compatibility reflected in eligibility criteria. We propose EligMeta, an agentic framework that integrates automated trial discovery with eligibility-aware me — Yao Zhao, Zhiyue Zhang, Yanxun Xu

arXiv

10mabout 4 hours ago

Research PapersFresh

Generalization Limits of Reinforcement Learning Alignment

arXiv:2604.02652v1 Announce Type: cross Abstract: The safety of large language models (LLMs) relies on alignment techniques such as reinforcement learning from human feedback (RLHF). However, recent theoretical analyses suggest that reinforcement learning-based training does not acquire new capabilities but merely redistributes the utilization probabilities of existing ones. In this study, we propose ``compound jailbreaks'' targeting OpenAI gpt-oss-20b, which exploit the generalization failures of alignment. This approach combines multiple attack techniques -- each individually defended agains — Haruhi Shida, Koo Imai, Keigo Kansa

arXiv

10mabout 4 hours ago

Research PapersFresh

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

arXiv:2604.02651v1 Announce Type: cross Abstract: Graph neural networks (GNNs) are widely used for learning on graph datasets derived from various real-world scenarios. Learning from extremely large graphs requires distributed training, and mini-batching with sampling is a popular approach for parallelizing GNN training. Existing distributed mini-batch approaches have significant performance bottlenecks due to expensive sampling methods and limited scaling when using data parallelism. In this work, we present ScaleGNN, a 4D parallel framework for scalable mini-batch GNN training that combines — Cunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele

arXiv

10mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 312 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

arXiv

10mabout 4 hours ago

Research PapersFresh

Generalization Limits of Reinforcement Learning Alignment

arXiv

10mabout 4 hours ago

Research PapersFresh

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

arXiv

10mabout 4 hours ago

Research PapersFresh

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

arXiv:2604.02633v1 Announce Type: cross Abstract: Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As a nascent alternative, Analytic Continual Learning (ACL) capitalizes on the intrinsic generalization properties of frozen pre-trained models to bolster continual learning performance. Nonetheless, a key drawback resides in the pr — Lei Song, Shihan Guan, Youyong Kong

arXiv

10mabout 4 hours ago