Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation

arXiv eess.ASby [Submitted on 31 Mar 2026]April 1, 20262 min read2 views

🧒Explain Like I'm 5Simple language

Imagine you're at a super noisy birthday party! Lots of friends are talking at once, music is playing, and grown-ups are chatting. It's hard to hear just one friend, right?

Scientists made a special computer helper, like a superhero ear! Its name is SR-CorrNet. This superhero ear helps computers listen to all the messy sounds and pick out just one person's voice, even when it's super loud and messy.

It's like having a magic sieve that sifts out all the noise and only lets the voice you want come through. So, computers can understand us better, even when there's a big party happening! Yay for clear listening!

arXiv:2603.29097v1 Announce Type: new Abstract: Speech separation in realistic acoustic environments remains challenging because overlapping speakers, background noise, and reverberation must be resolved simultaneously. Although recent time-frequency (TF) domain models have shown strong performance, most still rely on late-split architectures, where speaker disentanglement is deferred to the final stage, creating an information bottleneck and weakening discriminability under adverse conditions. To address this issue, we propose SR-CorrNet, an asymmetric encoder-decoder framework that introduces the separation-reconstruction (SepRe) strategy into a TF dual-path backbone. The encoder performs coarse separation from mixture observations, while the weight-shared decoder progressively reconstru

View PDF HTML (experimental)

Abstract:Speech separation in realistic acoustic environments remains challenging because overlapping speakers, background noise, and reverberation must be resolved simultaneously. Although recent time-frequency (TF) domain models have shown strong performance, most still rely on late-split architectures, where speaker disentanglement is deferred to the final stage, creating an information bottleneck and weakening discriminability under adverse conditions. To address this issue, we propose SR-CorrNet, an asymmetric encoder-decoder framework that introduces the separation-reconstruction (SepRe) strategy into a TF dual-path backbone. The encoder performs coarse separation from mixture observations, while the weight-shared decoder progressively reconstructs speaker-discriminative features with cross-speaker interaction, enabling stage-wise refinement. To complement this architecture, we formulate speech separation as a structured correlation-to-filter problem: spatio-spectro-temporal correlations computed from the observations are used as input features, and the corresponding deep filters are estimated to recover target signals. We further incorporate an attractor-based dynamic split module to adapt the number of output streams to the actual speaker configuration. Experimental results on WSJ0-2/3/4/5Mix, WHAMR!, and LibriCSS demonstrate consistent improvements across anechoic, noisy-reverberant, and real-recorded conditions in both single- and multi-channel settings, highlighting the effectiveness of TF-domain SepRe with correlation-based filter estimation for speech separation.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (T-ASLP)

Subjects:

Audio and Speech Processing (eess.AS); Sound (cs.SD)

Cite as: arXiv:2603.29097 [eess.AS]

(or arXiv:2603.29097v1 [eess.AS] for this version)

https://doi.org/10.48550/arXiv.2603.29097

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ui-Hyeop Shin [view email] [v1] Tue, 31 Mar 2026 00:37:15 UTC (1,538 KB)

Original source

arXiv eess.AS

https://arxiv.org/abs/2603.29097

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannouncefeature

ModelsFresh

OpenAI, Anthropic, and Google team up against unauthorized Chinese model copying

OpenAI, Anthropic, and Google have started working together to combat the unauthorized copying of their AI models by Chinese competitors, according to Bloomberg. The article OpenAI, Anthropic, and Google team up against unauthorized Chinese model copying appeared first on The Decoder .

The Decoder

1mabout 4 hours ago

ReleasesLive

Tax Changes and Share Incentives Set to Boost Early Stage Businesses

The UK government announced tax and share incentive changes at the 2025 budget which come into effect now at the start of the tax year, creating opportunities for early stage, innovative businesses. The updates are sign of the UK governments commitment to growing the startup and scaleup ecosystem which in turn is expected to [ ] The post Tax Changes and Share Incentives Set to Boost Early Stage Businesses appeared first on DIGIT .

Digit.fyi

1mabout 2 hours ago

ReleasesLive

An Alignment Journal: Features and policies

We previously announced a forthcoming research journal for AI alignment. This cross-post from our blog describes our tentative plans for the features and policies of the journal, including experiments like reviewer compensation and reviewer abstracts . It is the first in a series of posts that will go on to discuss our theory of change, comparison to related projects, possible partnerships and extensions, scope, personnel, and organizational structure. The journal is being built to serve the alignment research community. This post’s purpose is to solicit feedback and encourage you to contact us here if you want to participate, especially if you are interested in becoming a founding editor or part-time operations lead. The current plans are merely a starting point for the founding editorial

lesswrong.com

43m37 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 210 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsFresh

Anthropic signs multi-gigawatt TPU deal with Google and Broadcom

Anthropic has signed a deal with Google and Broadcom for multiple gigawatts of TPU computing capacity, set to come online starting in 2027. The article Anthropic signs multi-gigawatt TPU deal with Google and Broadcom appeared first on The Decoder .

The Decoder

1mabout 6 hours ago

ProductsFresh

Engineering An AI Agent To Navigate Large-scale Event Data Part 2

Part 2: From Query Patterns to Intelligent Tools Agent Design A simple search application can take in keywords, find exact matches and return results. It cannot however, reliably and... View article

MLOps Community Blog

1mabout 2 hours ago

ProductsFresh

Bezos Project Prometheus hires xAI co-founder from OpenAI

Jeff Bezos' startup Project Prometheus has hired Kyle Kosic, a co-founder of Elon Musk's xAI who most recently worked at OpenAI, the Financial Times reports. The article Bezos Project Prometheus hires xAI co-founder from OpenAI appeared first on The Decoder .

The Decoder

1mabout 4 hours ago

ProductsLive

What Amazon saw in Fauna Robotics’ humanoid strategy

Amazon did not acquire Fauna to ship a consumer humanoid. Sprout will not be walking around your living room folding laundry anytime soon. The post What Amazon saw in Fauna Robotics’ humanoid strategy appeared first on The Robot Report .

The Robot Report

1mabout 1 hour ago