Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI gives Japan's voice actors new commercial clout, rights protections - Japan TodayGNews AI JapanMicrosoft to invest $10 bil for Japan AI data centers - Japan TodayGNews AI JapanComcast Blackouts And NVIDIA AI Push Reshape Investor View On CMCSA - simplywall.stGNews AI NVIDIAOperationalize analytics agents: dbt AI updates + Mammoth’s AE agent in actiondbt BlogWhy OpenAI Buying TBPN Matters More Than It LooksDev.to AI'Every Industrial Company Will Become A Robotics Company,' Nvidia CEO Jensen Huang Says - Yahoo FinanceGNews AI NVIDIAI Built a Governance Layer That Works Across Claude Code, Codex, and Gemini CLIDev.to AICanônicoDev.to AIEconomyAI: Route to the Cheapest LLM That WorksDev.to AIWith hf cli, how do I resume an interrupted model download?discuss.huggingface.co5 способов использовать ChatGPT, не платя ни рубляDev.to AI⚖️ AI Is Transforming Legal Practice in Romania — Why Lawyers Who Ignore It Are Already Falling BehindDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI gives Japan's voice actors new commercial clout, rights protections - Japan TodayGNews AI JapanMicrosoft to invest $10 bil for Japan AI data centers - Japan TodayGNews AI JapanComcast Blackouts And NVIDIA AI Push Reshape Investor View On CMCSA - simplywall.stGNews AI NVIDIAOperationalize analytics agents: dbt AI updates + Mammoth’s AE agent in actiondbt BlogWhy OpenAI Buying TBPN Matters More Than It LooksDev.to AI'Every Industrial Company Will Become A Robotics Company,' Nvidia CEO Jensen Huang Says - Yahoo FinanceGNews AI NVIDIAI Built a Governance Layer That Works Across Claude Code, Codex, and Gemini CLIDev.to AICanônicoDev.to AIEconomyAI: Route to the Cheapest LLM That WorksDev.to AIWith hf cli, how do I resume an interrupted model download?discuss.huggingface.co5 способов использовать ChatGPT, не платя ни рубляDev.to AI⚖️ AI Is Transforming Legal Practice in Romania — Why Lawyers Who Ignore It Are Already Falling BehindDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

arXiv eess.ASby Yushen Chen, Junzhe Liu, Yujie Tu, Zhikang Niu, Yuzhe Liang, Chunyu Qiang, Chen Zhang, Kai Yu, Xie ChenApril 1, 20261 min read0 views
Source Quiz

arXiv:2601.13802v2 Announce Type: replace-cross Abstract: Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS training data covering 12+ regional dialects. A linguistically-informed curriculum learning strategy - progressing from Modern Standard Arabic to dialectal data - enables robust zero-shot synthesis without text diacritization. We further release the first standardized multi-dialect Arabic TTS

View PDF HTML (experimental)

Abstract:Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS training data covering 12+ regional dialects. A linguistically-informed curriculum learning strategy - progressing from Modern Standard Arabic to dialectal data - enables robust zero-shot synthesis without text diacritization. We further release the first standardized multi-dialect Arabic TTS benchmark, comprising over 11,000 utterances across 7 dialect subsets with manually verified transcripts. On this benchmark, our unified model matches or surpasses per-dialect specialized models. Both automatic metrics and human evaluations confirm that Habibi is highly competitive with ElevenLabs' Eleven v3 (alpha) in intelligibility, speaker similarity, and naturalness. Extensive ablations (~8,000 H100 GPU hours, 30+ configurations) validate each design choice. We open-source all checkpoints, training and inference code, and benchmark data - the first such release for multi-dialect Arabic TTS - at this https URL .

Subjects:

Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Cite as: arXiv:2601.13802 [cs.CL]

(or arXiv:2601.13802v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2601.13802

arXiv-issued DOI via DataCite

Submission history

From: Yushen Chen [view email] [v1] Tue, 20 Jan 2026 10:02:11 UTC (922 KB) [v2] Tue, 31 Mar 2026 12:16:39 UTC (935 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Habibi: Lay…modelbenchmarktrainingreleaseannounceopen-sourcearXiv eess.…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!