Models model benchmark training announce valuation analysis

Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability

arXiv eess.ASby [Submitted on 19 Jul 2025 (v1), last revised 1 Apr 2026 (this version, v3)]April 2, 20262 min read1 views

Source Quiz

arXiv:2507.17851v3 Announce Type: replace-cross Abstract: Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after dis

View PDF HTML (experimental)

Abstract:Self-supervised speech models learn representations that capture both content and speaker information. Yet this entanglement creates problems: content tasks suffer from speaker bias, and privacy concerns arise when speaker identity leaks through supposedly anonymized representations. We present two contributions to address these challenges. First, we develop InterpTRQE-SptME (Timbre Residual Quantitative Evaluation Benchmark of Speech pre-training Models Encoding via Interpretability), a benchmark that directly measures residual speaker information in content embeddings using SHAP-based interpretability analysis. Unlike existing indirect metrics, our approach quantifies the exact proportion of speaker information remaining after disentanglement. Second, we propose InterpTF-SptME, which uses these interpretability insights to filter speaker information from embeddings. Testing on VCTK with seven models including HuBERT, WavLM, and ContentVec, we find that SHAP Noise filtering reduces speaker residuals from 18.05% to nearly zero while maintaining recognition accuracy (CTC loss increase under 1%). The method is model-agnostic and requires no retraining.

Comments: 5 pages, 4 figures

Subjects:

Sound (cs.SD); Audio and Speech Processing (eess.AS)

Cite as: arXiv:2507.17851 [cs.SD]

(or arXiv:2507.17851v3 [cs.SD] for this version)

https://doi.org/10.48550/arXiv.2507.17851

arXiv-issued DOI via DataCite

Submission history

From: Xiaoxu Zhu [view email] [v1] Sat, 19 Jul 2025 04:49:49 UTC (893 KB) [v2] Fri, 24 Oct 2025 09:24:58 UTC (581 KB) [v3] Wed, 1 Apr 2026 02:49:32 UTC (593 KB)

Original source

arXiv eess.AS

https://arxiv.org/abs/2507.17851

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarktraining

Market NewsFresh

The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management

arXiv:2604.02279v1 Announce Type: cross Abstract: Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 20 competing methods, and critique and vote on each other's output. A researcher agent proposes new portfolio construction methods not yet represented, and a meta-agent compares past forecasts against realized returns and rewrites agent code and prompts to improve future performance. The entire pipeline is governed by the Investment Policy Statement--the same document that guides human portfolio managers can now constrain and direct autonomous agents.

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

arXiv:2506.03828v3 Announce Type: replace-cross Abstract: AI for Industrial Asset Lifecycle Management aims to automate complex operational workflows, such as condition monitoring and maintenance scheduling, to minimize system downtime. While traditional AI/ML approaches solve narrow tasks in isolation, Large Language Model (LLM) agents offer a next-generation opportunity for end-to-end automation. In this paper, we introduce AssetOpsBench, a unified framework for orchestrating and evaluating domain-specific agents for Industry 4.0. AssetOpsBench provides a multimodal ecosystem comprising a catalog of four domain-specific agents, a curated dataset of 140+ human-authored natural-language queries grounded in real industrial scenarios, and a simulated, CouchDB-backed IoT environment. We intro

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments

arXiv:2604.02142v1 Announce Type: cross Abstract: We consider energy-aware planning for an unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) team operating in a stochastic environment. The UAV must visit a set of air points in minimum time while respecting energy constraints, relying on the UGV as a mobile charging station. Unlike prior work that assumed deterministic travel times or used fixed robustness margins, we model travel times as random variables and bound the probability of failure (energy depletion) across the entire mission to a user-specified risk level. We formulate the problem as a Mixed-Integer Program and propose PRO-SPECT, a polynomial-time algorithm that generates risk-bounded plans. The algorithm supports both offline planning and online re-planning, enabl

arXiv cs.MA

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 219 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

A Role-Based LLM Framework for Structured Information Extraction from Healthy Food Policies

arXiv:2604.01529v1 Announce Type: cross Abstract: Current Large Language Model (LLM) approaches for information extraction (IE) in the healthy food policy domain are often hindered by various factors, including misinformation, specifically hallucinations, misclassifications, and omissions that result from the structural diversity and inconsistency of policy documents. To address these limitations, this study proposes a role-based LLM framework that automates the IE from unstructured policy data by assigning specialized roles: an LLM policy analyst for metadata and mechanism classification, an LLM legal strategy specialist for identifying complex legal approaches, and an LLM food system expert for categorizing food system stages. This framework mimics expert analysis workflows by incorporat

arXiv cs.MA

1mabout 3 hours ago

ModelsFresh

TRACE: Transparent Web Reliability Assessment with Contextual Explanations

arXiv:2506.12072v4 Announce Type: replace Abstract: In an era of AI-generated misinformation flooding the web, existing tools struggle to empower users with nuanced, transparent assessments of content credibility. They often default to binary (true/false) classifications without contextual justifications, leaving users vulnerable to disinformation. We address this gap by introducing TRACE: Transparent Reliability Assessment with Contextual Explanations, a unified framework that performs two key tasks: (1) it assigns a fine-grained, continuous reliability score (from 0.1 to 1.0) to web content, and (2) it generates a contextual explanation for its assessment. The core of TRACE is the TrueGL-1B model, fine-tuned on a novel, large-scale dataset of over 140,000 articles. This dataset's primary

arXiv cs.IR

1mabout 3 hours ago