Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
arXiv:2601.13802v2 Announce Type: replace-cross Abstract: Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS training data covering 12+ regional dialects. A linguistically-informed curriculum learning strategy - progressing from Modern Standard Arabic to dialectal data - enables robust zero-shot synthesis without text diacritization. We further release the first standardized multi-dialect Arabic TTS
View PDF HTML (experimental)
Abstract:Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS training data covering 12+ regional dialects. A linguistically-informed curriculum learning strategy - progressing from Modern Standard Arabic to dialectal data - enables robust zero-shot synthesis without text diacritization. We further release the first standardized multi-dialect Arabic TTS benchmark, comprising over 11,000 utterances across 7 dialect subsets with manually verified transcripts. On this benchmark, our unified model matches or surpasses per-dialect specialized models. Both automatic metrics and human evaluations confirm that Habibi is highly competitive with ElevenLabs' Eleven v3 (alpha) in intelligibility, speaker similarity, and naturalness. Extensive ablations (~8,000 H100 GPU hours, 30+ configurations) validate each design choice. We open-source all checkpoints, training and inference code, and benchmark data - the first such release for multi-dialect Arabic TTS - at this https URL .
Subjects:
Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2601.13802 [cs.CL]
(or arXiv:2601.13802v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2601.13802
arXiv-issued DOI via DataCite
Submission history
From: Yushen Chen [view email] [v1] Tue, 20 Jan 2026 10:02:11 UTC (922 KB) [v2] Tue, 31 Mar 2026 12:16:39 UTC (935 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarktraining
With hf cli, how do I resume an interrupted model download?
I have a slow internet and the download of a large file was interrupted 30GB in! I download using the ‘hf’ CLI command, like this: hf download unsloth/gemma-4-31B-it-GGUF gemma-4-31B-it-UD-Q8_K_XL.gguf When I ran it again, it started over instead of resuming, to my horror. How do I avoid redownloading a partial model next time? I don’t see a resume option in hf download –help 1 post - 1 participant Read full topic
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!