Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI, Warfare, and Augmented Cities - Small Wars JournalGNews AI USAGamingtak Sony koopt start-up die foto s en video s omzet naar 3dTweakers.netChinese Chip Makers Hit Record Revenue on AI Boom, US Curbs - The Tech BuzzGNews AI ChinaMicrosoft Launches Three New AI Models to Advance Speech, Voice, and Image Capabilities - CXO DigitalpulseGNews AI voiceU.S. and China control 90% of AI data centres — the Global South is building a different kind of AI - Silicon CanalsGNews AI ChinaOpen Call: Accelerating AI Readiness and Adoption (United States) - fundsforNGOsGNews AI USA跳出幸存者偏差,从结构性资源分配解析财富真相Dev.to AIJapan s Sakura Internet jumps 20% as Microsoft plans $10 billion AI push with SoftBankCNBC TechnologyJapan's Sakura Internet jumps 20% as Microsoft plans $10 billion AI push with SoftBank - CNBCGNews AI JapanMicrosoft plans $10 billion investment in Japan to grow AI, train 1 million workers by 2030 - livemint.comGNews AI JapanOpenClaw vs Cloud AI: Which One Actually Gives Businesses More Control?Medium AI“In a World of AI Content, Being Human Is Your Superpower”Medium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI, Warfare, and Augmented Cities - Small Wars JournalGNews AI USAGamingtak Sony koopt start-up die foto s en video s omzet naar 3dTweakers.netChinese Chip Makers Hit Record Revenue on AI Boom, US Curbs - The Tech BuzzGNews AI ChinaMicrosoft Launches Three New AI Models to Advance Speech, Voice, and Image Capabilities - CXO DigitalpulseGNews AI voiceU.S. and China control 90% of AI data centres — the Global South is building a different kind of AI - Silicon CanalsGNews AI ChinaOpen Call: Accelerating AI Readiness and Adoption (United States) - fundsforNGOsGNews AI USA跳出幸存者偏差,从结构性资源分配解析财富真相Dev.to AIJapan s Sakura Internet jumps 20% as Microsoft plans $10 billion AI push with SoftBankCNBC TechnologyJapan's Sakura Internet jumps 20% as Microsoft plans $10 billion AI push with SoftBank - CNBCGNews AI JapanMicrosoft plans $10 billion investment in Japan to grow AI, train 1 million workers by 2030 - livemint.comGNews AI JapanOpenClaw vs Cloud AI: Which One Actually Gives Businesses More Control?Medium AI“In a World of AI Content, Being Human Is Your Superpower”Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability

arXiv eess.ASby [Submitted on 19 Jul 2025 (v1), last revised 1 Apr 2026 (this version, v3)]April 2, 20262 min read1 views
Source Quiz

arXiv:2507.17851v3 Announce Type: replace-cross Abstract: Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after dis

View PDF HTML (experimental)

Abstract:Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after disentanglement. Second, we propose InterpTF-SptME, which uses these interpretability insights to filter speaker information from embeddings. Testing on VCTK with seven models including HuBERT, WavLM, and ContentVec, we find that SHAP Noise filtering reduces speaker residuals from 18.05% to nearly zero while maintaining recognition accuracy (CTC loss increase under 1%). The method is model-agnostic and requires no retraining.

Comments: 5 pages, 4 figures

Subjects:

Sound (cs.SD); Audio and Speech Processing (eess.AS)

Cite as: arXiv:2507.17851 [cs.SD]

(or arXiv:2507.17851v3 [cs.SD] for this version)

https://doi.org/10.48550/arXiv.2507.17851

arXiv-issued DOI via DataCite

Submission history

From: Xiaoxu Zhu [view email] [v1] Sat, 19 Jul 2025 04:49:49 UTC (893 KB) [v2] Fri, 24 Oct 2025 09:24:58 UTC (581 KB) [v3] Wed, 1 Apr 2026 02:49:32 UTC (593 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarktraining

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Speaker Dis…modelbenchmarktrainingannouncevaluationanalysisarXiv eess.…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 219 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models