Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language
arXiv:2603.27021v1 Announce Type: new Abstract: We present the Pashto Common Voice corpus -- the first large-scale, openly licensed speech resource for Pashto, a language with over 60 million native speakers largely absent from open speech technology. Through a community effort spanning 2022-2025, the corpus grew from 1.5 hours and 5 contributors to 147 total hours and 1,483 unique speakers across ten Mozilla Common Voice releases (CV14-CV23). Speaker participation increased approximately 108-fold between CV17 and CV18, coinciding with a VOA Pashto broadcast campaign. We describe the full meth — Hanif Rahman, Shafeeq ur Rehman
View PDF HTML (experimental)
Abstract:We present the Pashto Common Voice corpus -- the first large-scale, openly licensed speech resource for Pashto, a language with over 60 million native speakers largely absent from open speech technology. Through a community effort spanning 2022-2025, the corpus grew from 1.5 hours and 5 contributors to 147 total hours and 1,483 unique speakers across ten Mozilla Common Voice releases (CV14-CV23). Speaker participation increased approximately 108-fold between CV17 and CV18, coinciding with a VOA Pashto broadcast campaign. We describe the full methodology: interface localisation, Wikipedia-based sentence extraction with automated filtering, phonemically targeted contributions for the four most frequently dropped Pashto characters, and multi-channel community outreach. MCV23 contains 107,781 clips (60,337 validated; 82.33 validated hours) across 13 content domains. Fine-tuning Whisper Base on the MCV20 yields 13.4% WER on the MCV20 test split, against the published Whisper Base zero-shot WER of 99.0% on Pashto.
Comments: Submitted to Interspeech 2026
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.27021 [cs.CL]
(or arXiv:2603.27021v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.27021
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Hanif Rahman [view email] [v1] Fri, 27 Mar 2026 22:22:03 UTC (34 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivFindings from the AI Climate Hoax: What is the real climate impact of data centres? - Finextra Research
<a href="https://news.google.com/rss/articles/CBMiwwFBVV95cUxNYlEyeXg4dVpzSC1xZzdhUHRzdkJ5VkVuRF94MlZCbVVUZ3NmaEh6NUg5OHA2a3BZd3paQk85Rlo5Tm8xT1lwUUt0WHlZeU1lckw2NjZTZEpFM2NtQnVESi1FTnNzR2duYmdfTXMzMGhraEc3ZHN2a1I3cmVnZUQ3TnhZUGFLT29oNzJxRWdVOTdVM0E5NmNBZlo5RHR6em4tdmo5NmJDRjgzZVdRNUlXMDE0U2dSTy1XVE1nMmlUU0hGT1k?oc=5" target="_blank">Findings from the AI Climate Hoax: What is the real climate impact of data centres?</a> <font color="#6f6f6f">Finextra Research</font>
UTA opens AI-driven Smart Agriculture Research Center - uta.edu
<a href="https://news.google.com/rss/articles/CBMipgFBVV95cUxPUzFsREVuMVdwd0k5dGp6M2V5bW9sWkhDZlhEdENoZFQ0NHg2c2tWVWRrbW5PZ2Z3a3RFd3dleHJPZzZxMW5mZV9JUV9FYk55bVVHcXV5UzJiOTdsV2JfVWlnZE1xdVczSVh6RGQ4c2xDWkl3SS1zakVwNDZoOWNpVGRYTUVxTzREal94dk9BVnRWRzlQMi1UODJKLWkwc2RsOVdSOFZR?oc=5" target="_blank">UTA opens AI-driven Smart Agriculture Research Center</a> <font color="#6f6f6f">uta.edu</font>
Amazon opens Spring 2026 research awards across agentic AI, robotics, security, and Trainium - edtechinnovationhub.com
<a href="https://news.google.com/rss/articles/CBMiygFBVV95cUxOWmx1UERmaWhUdXA4N2Qtclc1aXBISkxqYm83WFctV1gyajlnMEUxeEdKU1RnS19KRkhzRTFYU3NSY1VyM0FBSkNnN3JGRTQ2UWZORnFDX3FGYWNhcVhFSnF6U1J2X2JndXk0aWIyV1A5NGZUN2s3eVowcEJzalp2eW05enpZMHlVOGlfYnhiLVlvNFZSamNJQU1oTGJOMVRrUnRsU19Xa3J4WHB6VDhGbHN3U05ROG5FN0o0ZXV4TGlOOFo3QmlwbXdR?oc=5" target="_blank">Amazon opens Spring 2026 research awards across agentic AI, robotics, security, and Trainium</a> <font color="#6f6f6f">edtechinnovationhub.com</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Findings from the AI Climate Hoax: What is the real climate impact of data centres? - Finextra Research
<a href="https://news.google.com/rss/articles/CBMiwwFBVV95cUxNYlEyeXg4dVpzSC1xZzdhUHRzdkJ5VkVuRF94MlZCbVVUZ3NmaEh6NUg5OHA2a3BZd3paQk85Rlo5Tm8xT1lwUUt0WHlZeU1lckw2NjZTZEpFM2NtQnVESi1FTnNzR2duYmdfTXMzMGhraEc3ZHN2a1I3cmVnZUQ3TnhZUGFLT29oNzJxRWdVOTdVM0E5NmNBZlo5RHR6em4tdmo5NmJDRjgzZVdRNUlXMDE0U2dSTy1XVE1nMmlUU0hGT1k?oc=5" target="_blank">Findings from the AI Climate Hoax: What is the real climate impact of data centres?</a> <font color="#6f6f6f">Finextra Research</font>
UTA opens AI-driven Smart Agriculture Research Center - uta.edu
<a href="https://news.google.com/rss/articles/CBMipgFBVV95cUxPUzFsREVuMVdwd0k5dGp6M2V5bW9sWkhDZlhEdENoZFQ0NHg2c2tWVWRrbW5PZ2Z3a3RFd3dleHJPZzZxMW5mZV9JUV9FYk55bVVHcXV5UzJiOTdsV2JfVWlnZE1xdVczSVh6RGQ4c2xDWkl3SS1zakVwNDZoOWNpVGRYTUVxTzREal94dk9BVnRWRzlQMi1UODJKLWkwc2RsOVdSOFZR?oc=5" target="_blank">UTA opens AI-driven Smart Agriculture Research Center</a> <font color="#6f6f6f">uta.edu</font>
Here’s How Generative AI Affects Creativity, According to UH Research - University of Houston
<a href="https://news.google.com/rss/articles/CBMingFBVV95cUxPZUtGeFJiOUp0YTlobGk2Mmg1QmhJSl8wLWc2YXVsUHRIN0lRZGJBTDI2UUpJNkNhY19hQ2JDZ2g3b05UZHJETTVveXhTaG93RTZFY3IzMjZiVGVjZC02bnFnQnN4YnVjM0tRVFdBME53eXRtREJqMnh1dllEakxYQUExWUliOExvTmhnVWl5Q1dmNlZTanExVVBVY0JBQQ?oc=5" target="_blank">Here’s How Generative AI Affects Creativity, According to UH Research</a> <font color="#6f6f6f">University of Houston</font>
Purdue researchers want to harness AI to secure corn crops - Michigan Farm News
<a href="https://news.google.com/rss/articles/CBMilgFBVV95cUxNRjRrQ2lQcEF6OWVrU3dob2diZkxoZTlEM2ZSencxWjREeFpzUmgxOW4yeFdpRWlMRW1mdlVDQURFeHl5bmhaR0tGcXVNRXU2SC1iTVozdU8wdHRCczEybV8yOTJ6dWZ2eWRPbWtET2JPN0JXMnBIUnNOakF4WllJTjNKb1ZTTUNKel9RdGY2Ri1FRjltbkE?oc=5" target="_blank">Purdue researchers want to harness AI to secure corn crops</a> <font color="#6f6f6f">Michigan Farm News</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!