Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
arXiv:2507.17851v3 Announce Type: replace-cross Abstract: Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after dis
View PDF HTML (experimental)
Abstract:Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after disentanglement. Second, we propose InterpTF-SptME, which uses these interpretability insights to filter speaker information from embeddings. Testing on VCTK with seven models including HuBERT, WavLM, and ContentVec, we find that SHAP Noise filtering reduces speaker residuals from 18.05% to nearly zero while maintaining recognition accuracy (CTC loss increase under 1%). The method is model-agnostic and requires no retraining.
Comments: 5 pages, 4 figures
Subjects:
Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2507.17851 [cs.SD]
(or arXiv:2507.17851v3 [cs.SD] for this version)
https://doi.org/10.48550/arXiv.2507.17851
arXiv-issued DOI via DataCite
Submission history
From: Xiaoxu Zhu [view email] [v1] Sat, 19 Jul 2025 04:49:49 UTC (893 KB) [v2] Fri, 24 Oct 2025 09:24:58 UTC (581 KB) [v3] Wed, 1 Apr 2026 02:49:32 UTC (593 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarktraining
The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management
arXiv:2604.02279v1 Announce Type: cross Abstract: Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 20 competing methods, and critique and vote on each other's output. A researcher agent proposes new portfolio construction methods not yet represented, and a meta-agent compares past forecasts against realized returns and rewrites agent code and prompts to improve future performance. The entire pipeline is governed by the Investment Policy Statement--the same document that guides human portfolio managers can now constrain and direct autonomous agents.

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
arXiv:2506.03828v3 Announce Type: replace-cross Abstract: AI for Industrial Asset Lifecycle Management aims to automate complex operational workflows, such as condition monitoring and maintenance scheduling, to minimize system downtime. While traditional AI/ML approaches solve narrow tasks in isolation, Large Language Model (LLM) agents offer a next-generation opportunity for end-to-end automation. In this paper, we introduce AssetOpsBench, a unified framework for orchestrating and evaluating domain-specific agents for Industry 4.0. AssetOpsBench provides a multimodal ecosystem comprising a catalog of four domain-specific agents, a curated dataset of 140+ human-authored natural-language queries grounded in real industrial scenarios, and a simulated, CouchDB-backed IoT environment. We intro

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments
arXiv:2604.02142v1 Announce Type: cross Abstract: We consider energy-aware planning for an unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) team operating in a stochastic environment. The UAV must visit a set of air points in minimum time while respecting energy constraints, relying on the UGV as a mobile charging station. Unlike prior work that assumed deterministic travel times or used fixed robustness margins, we model travel times as random variables and bound the probability of failure (energy depletion) across the entire mission to a user-specified risk level. We formulate the problem as a Mixed-Integer Program and propose PRO-SPECT, a polynomial-time algorithm that generates risk-bounded plans. The algorithm supports both offline planning and online re-planning, enabl
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
arXiv:2506.03828v3 Announce Type: replace-cross Abstract: AI for Industrial Asset Lifecycle Management aims to automate complex operational workflows, such as condition monitoring and maintenance scheduling, to minimize system downtime. While traditional AI/ML approaches solve narrow tasks in isolation, Large Language Model (LLM) agents offer a next-generation opportunity for end-to-end automation. In this paper, we introduce AssetOpsBench, a unified framework for orchestrating and evaluating domain-specific agents for Industry 4.0. AssetOpsBench provides a multimodal ecosystem comprising a catalog of four domain-specific agents, a curated dataset of 140+ human-authored natural-language queries grounded in real industrial scenarios, and a simulated, CouchDB-backed IoT environment. We intro

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments
arXiv:2604.02142v1 Announce Type: cross Abstract: We consider energy-aware planning for an unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) team operating in a stochastic environment. The UAV must visit a set of air points in minimum time while respecting energy constraints, relying on the UGV as a mobile charging station. Unlike prior work that assumed deterministic travel times or used fixed robustness margins, we model travel times as random variables and bound the probability of failure (energy depletion) across the entire mission to a user-specified risk level. We formulate the problem as a Mixed-Integer Program and propose PRO-SPECT, a polynomial-time algorithm that generates risk-bounded plans. The algorithm supports both offline planning and online re-planning, enabl

A Role-Based LLM Framework for Structured Information Extraction from Healthy Food Policies
arXiv:2604.01529v1 Announce Type: cross Abstract: Current Large Language Model (LLM) approaches for information extraction (IE) in the healthy food policy domain are often hindered by various factors, including misinformation, specifically hallucinations, misclassifications, and omissions that result from the structural diversity and inconsistency of policy documents. To address these limitations, this study proposes a role-based LLM framework that automates the IE from unstructured policy data by assigning specialized roles: an LLM policy analyst for metadata and mechanism classification, an LLM legal strategy specialist for identifying complex legal approaches, and an LLM food system expert for categorizing food system stages. This framework mimics expert analysis workflows by incorporat

TRACE: Transparent Web Reliability Assessment with Contextual Explanations
arXiv:2506.12072v4 Announce Type: replace Abstract: In an era of AI-generated misinformation flooding the web, existing tools struggle to empower users with nuanced, transparent assessments of content credibility. They often default to binary (true/false) classifications without contextual justifications, leaving users vulnerable to disinformation. We address this gap by introducing TRACE: Transparent Reliability Assessment with Contextual Explanations, a unified framework that performs two key tasks: (1) it assigns a fine-grained, continuous reliability score (from 0.1 to 1.0) to web content, and (2) it generates a contextual explanation for its assessment. The core of TRACE is the TrueGL-1B model, fine-tuned on a novel, large-scale dataset of over 140,000 articles. This dataset's primary


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!