Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessUBTECH 2025 "Report Card": Revenue from Full-Size Humanoid Robots Grows Over 22-Fold - GasgooGoogle News - AI roboticsLoving and Hating Apple, OEM Manufacturing of AI Glasses: Can Goertek Inc. "Change Its Fate Against All Odds"? - 36Kr 36氪GNews AI manufacturingMore, and More Extensive, Supply Chain Attackslesswrong.comAI in Bioinformatics Market Size, Share | Industry Report [2034] - Fortune Business InsightsGNews AI drug discoveryDryft: What if AI memory worked like an ecosystem instead of a filing cabinet?DEV CommunityChina's new sensor gives humanoid robot hand sense of its own posture - Interesting EngineeringGoogle News - AI roboticsWeb Scraping Tools Comparison 2026: requests vs curl_cffi vs Playwright vs ScrapyDEV CommunitySamsung SDS Highlights 'Agentic AI' as Next Phase of Supply Chain Innovation - thelec.netGNews AI agenticQualcomm Joins Korea's 'Challenge AX' Program to Support AI Startups - thelec.netGNews AI KoreaAI Is Turning Film Pitches into Proof—But Korea’s Financing Model Still Lags - KoreaTechDeskGNews AI KoreaFrom Next.js to Pareto: What Changes and What Stays the SameDEV CommunityA Quick Note on Gemma 4 Image Settings in Llama.cppDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessUBTECH 2025 "Report Card": Revenue from Full-Size Humanoid Robots Grows Over 22-Fold - GasgooGoogle News - AI roboticsLoving and Hating Apple, OEM Manufacturing of AI Glasses: Can Goertek Inc. "Change Its Fate Against All Odds"? - 36Kr 36氪GNews AI manufacturingMore, and More Extensive, Supply Chain Attackslesswrong.comAI in Bioinformatics Market Size, Share | Industry Report [2034] - Fortune Business InsightsGNews AI drug discoveryDryft: What if AI memory worked like an ecosystem instead of a filing cabinet?DEV CommunityChina's new sensor gives humanoid robot hand sense of its own posture - Interesting EngineeringGoogle News - AI roboticsWeb Scraping Tools Comparison 2026: requests vs curl_cffi vs Playwright vs ScrapyDEV CommunitySamsung SDS Highlights 'Agentic AI' as Next Phase of Supply Chain Innovation - thelec.netGNews AI agenticQualcomm Joins Korea's 'Challenge AX' Program to Support AI Startups - thelec.netGNews AI KoreaAI Is Turning Film Pitches into Proof—But Korea’s Financing Model Still Lags - KoreaTechDeskGNews AI KoreaFrom Next.js to Pareto: What Changes and What Stays the SameDEV CommunityA Quick Note on Gemma 4 Image Settings in Llama.cppDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2510.06961v4 Announce Type: replace-cross Abstract: We present the Open ASR Leaderboard, a reproducible benchmarking platform with community contributions from academia and industry. It compares 86 open-source and proprietary systems across 12 datasets, with English short- and long-form and multilingual short-form tracks. We standardize word error rate (WER) and inverse real-time factor (RTFx) evaluation for consistent accuracy-efficiency comparisons across model architectures and toolkits (e.g., ESPNet, NeMo, SpeechBrain, Transformers). We observe that Conformer-based encoders paired wi — Vaibhav Srivastav, Steven Zheng, Eric Bezzam, Eustache Le Bihan, Nithin Rao Koluguri, Piotr \.Zelasko, Somshubra Majumdar, Adel Moumen, Sanchit Gandhi

View PDF HTML (experimental)

Abstract:We present the Open ASR Leaderboard, a reproducible benchmarking platform with community contributions from academia and industry. It compares 86 open-source and proprietary systems across 12 datasets, with English short- and long-form and multilingual short-form tracks. We standardize word error rate (WER) and inverse real-time factor (RTFx) evaluation for consistent accuracy-efficiency comparisons across model architectures and toolkits (e.g., ESPNet, NeMo, SpeechBrain, Transformers). We observe that Conformer-based encoders paired with transformer-based decoders achieve the best average WER, while connectionist temporal classification (CTC) and token-and-duration transducer (TDT) decoders offer superior RTFx, making them better suited for long-form and batched processing. All code and dataset loaders are open-sourced to support transparent, extensible evaluation. We present our evaluation methodology to facilitate community-driven benchmarking in ASR and other tasks.

Comments: Leaderboard: this https URL ; Code: this https URL

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Cite as: arXiv:2510.06961 [cs.CL]

(or arXiv:2510.06961v4 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2510.06961

arXiv-issued DOI via DataCite

Submission history

From: Eric Bezzam [view email] [v1] Wed, 8 Oct 2025 12:44:51 UTC (25 KB) [v2] Thu, 9 Oct 2025 07:39:28 UTC (25 KB) [v3] Wed, 10 Dec 2025 17:30:55 UTC (23 KB) [v4] Mon, 30 Mar 2026 09:52:05 UTC (2,783 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Open ASR Le…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 139 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!