Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones

arXiv cs.HCby Tianle Yang, Chengzhe Sun, Phil Rose, Siwei LyuApril 3, 20261 min read0 views

arXiv:2604.01562v1 Announce Type: cross Abstract: Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perception study, clones are rated as more similar to their originals for standard than for accented speakers, and intelligibility increases from original to clone, with a larger gain for accented speech. These results show that accent variation can shape perceived identity match and intelligibility in voice cloning even when it is not reflected in an o

View PDF HTML (experimental)

Abstract:Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perception study, clones are rated as more similar to their originals for standard than for accented speakers, and intelligibility increases from original to clone, with a larger gain for accented speech. These results show that accent variation can shape perceived identity match and intelligibility in voice cloning even when it is not reflected in an off-the-shelf speaker-embedding distance, and they motivate evaluating speaker identity preservation and accent preservation as separable dimensions.

Subjects:

Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Cite as: arXiv:2604.01562 [cs.SD]

(or arXiv:2604.01562v1 [cs.SD] for this version)

https://doi.org/10.48550/arXiv.2604.01562

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Tianle Yang [view email] [v1] Thu, 2 Apr 2026 03:17:41 UTC (98 KB)

Original source

arXiv cs.HC

https://arxiv.org/abs/2604.01562

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announcestudyarxiv

ModelsFresh

The Evolution of Tool Use in LLM Agents: From Single-Tool Call to Multi-Tool Orchestration

arXiv:2603.22862v2 Announce Type: replace Abstract: Tool use enables large language models (LLMs) to access external information, invoke software systems, and act in digital environments beyond what can be solved from model parameters alone. Early research mainly studied whether a model could select and execute a correct single tool call. As agent systems evolve, however, the central problem has shifted from isolated invocation to multi-tool orchestration over long trajectories with intermediate state, execution feedback, changing environments, and practical constraints such as safety, cost, and verifiability. We comprehensively review recent progress in multi-tool LLM agents and analyzes the state of the art in this rapidly developing area. First, we unify task formulations and distinguis

arXiv cs.SE

1mabout 4 hours ago

ModelsFresh

A Self-Improving Architecture for Dynamic Safety in Large Language Models

arXiv:2511.07645v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. R

arXiv cs.SE

2mabout 4 hours ago

ModelsFresh

HAFixAgent: History-Aware Program Repair Agent

arXiv:2511.01047v3 Announce Type: replace Abstract: Automated program repair (APR) has recently shifted toward large language models and agent-based systems, yet most systems rely on local snapshot context, overlooking repository history. Prior work shows that repository history helps repair single-line bugs, since the last commit touching the buggy line is often the bug-introducing one. In this paper, we investigate whether repository history can also improve agentic APR systems at scale, especially for complex multi-hunk bugs. We present HAFixAgent, a History-Aware Bug-Fixing Agent that injects blame-derived repository heuristics into its repair loop. A preliminary study on 854 Defects4J (Java) and 501 BugsInPy (Python) bugs motivates our design, showing that bug-relevant history is wide

arXiv cs.SE

2mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 194 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Seeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agents

Hi everyone, I’m an independent researcher (Zensation AI) seeking endorsement for my first arXiv submission in cs.AI. Paper: “ZenBrain: A Neuroscience-Inspired 7-Layer Memory Architecture for Autonomous AI Systems” Summary: ZenBrain is the first AI memory system grounded in cognitive neuroscience. It implements 7 memory layers (working, short-term, episodic, semantic, procedural, core, cross-context) with 12 algorithms including Hebbian learning, FSRS spaced repetition, sleep-time consolidation (Stickgold & Walker 2013), and Bayesian confidence propagation. Prior art: Published as defensive publication on TDCommons (dpubs_series/9683) and archived on Zenodo (DOI: 10.5281/zenodo.19353663). Open-source npm packages with 9,000+ tests. Why this matters: Recent surveys (arxiv:2603.07670) identi