Live
Black Hat USADark ReadingBlack Hat AsiaAI Business'AI-pilled' engineers are working harder and burning out faster, Django co-creator saysBusiness InsiderOpenAI’s new ChatGPT base model ‘Spud’: All you need to know - Storyboard18Google News: ChatGPTMicrosoft releases foundational AI models targeting enterprisesSilicon RepublicSeeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agentsdiscuss.huggingface.coGenerative AI: A Legal Framework in Development - group.bnpparibasGoogle News: Generative AIS. Korea, France Bolster Ties in AI, Quantum Computing - KBS WORLD RadioGNews AI KoreaGoogle launches Gemma 4 with a broad licensing model - Techzine GlobalGoogle News: DeepMindDesktop Nightly v2.2.0-nightly.202604030631LobeChat ReleasesMan uses AI to build $1 billion telehealth company, but secret sauce is GLP-1 drug - India TodayGNews AI IndiaThe Missing Data Problem Behind Broken Computer-Use AgentsHackernoon AINVIDIA’s $2 billion sprinkler remaking the AI supply chain - Asia TimesGNews AI NVIDIAWyldheart developer Wayfinder Studios is "really against generative AI" - Gamereactor UKGoogle News: Generative AIBlack Hat USADark ReadingBlack Hat AsiaAI Business'AI-pilled' engineers are working harder and burning out faster, Django co-creator saysBusiness InsiderOpenAI’s new ChatGPT base model ‘Spud’: All you need to know - Storyboard18Google News: ChatGPTMicrosoft releases foundational AI models targeting enterprisesSilicon RepublicSeeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agentsdiscuss.huggingface.coGenerative AI: A Legal Framework in Development - group.bnpparibasGoogle News: Generative AIS. Korea, France Bolster Ties in AI, Quantum Computing - KBS WORLD RadioGNews AI KoreaGoogle launches Gemma 4 with a broad licensing model - Techzine GlobalGoogle News: DeepMindDesktop Nightly v2.2.0-nightly.202604030631LobeChat ReleasesMan uses AI to build $1 billion telehealth company, but secret sauce is GLP-1 drug - India TodayGNews AI IndiaThe Missing Data Problem Behind Broken Computer-Use AgentsHackernoon AINVIDIA’s $2 billion sprinkler remaking the AI supply chain - Asia TimesGNews AI NVIDIAWyldheart developer Wayfinder Studios is "really against generative AI" - Gamereactor UKGoogle News: Generative AI
AI NEWS HUBbyEIGENVECTOREigenvector

Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones

arXiv cs.HCby Tianle Yang, Chengzhe Sun, Phil Rose, Siwei LyuApril 3, 20261 min read0 views
Source Quiz

arXiv:2604.01562v1 Announce Type: cross Abstract: Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perception study, clones are rated as more similar to their originals for standard than for accented speakers, and intelligibility increases from original to clone, with a larger gain for accented speech. These results show that accent variation can shape perceived identity match and intelligibility in voice cloning even when it is not reflected in an o

View PDF HTML (experimental)

Abstract:Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perception study, clones are rated as more similar to their originals for standard than for accented speakers, and intelligibility increases from original to clone, with a larger gain for accented speech. These results show that accent variation can shape perceived identity match and intelligibility in voice cloning even when it is not reflected in an off-the-shelf speaker-embedding distance, and they motivate evaluating speaker identity preservation and accent preservation as separable dimensions.

Subjects:

Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Cite as: arXiv:2604.01562 [cs.SD]

(or arXiv:2604.01562v1 [cs.SD] for this version)

https://doi.org/10.48550/arXiv.2604.01562

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Tianle Yang [view email] [v1] Thu, 2 Apr 2026 03:17:41 UTC (98 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announcestudyarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Acoustic an…announcestudyarxivarXiv cs.HC

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 194 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers