An Empirical Recipe for Universal Phone Recognition
arXiv:2603.29042v1 Announce Type: new Abstract: Phone recognition (PR) is a key enabler of multilingual and low-resource speech processing tasks, yet robust performance remains elusive. Highly performant English-focused models do not generalize across languages, while multilingual models underutilize pretrained representations. It also remains unclear how data scale, architecture, and training objective contribute to multilingual PR. We present PhoneticXEUS -- trained on large-scale multilingual data and achieving state-of-the-art performance on both multilingual (17.7% PFER) and accented English speech (10.6% PFER). Through controlled ablations with evaluations across 100+ languages under a unified scheme, we empirically establish our training recipe and quantify the impact of SSL represe
View PDF HTML (experimental)
Abstract:Phone recognition (PR) is a key enabler of multilingual and low-resource speech processing tasks, yet robust performance remains elusive. Highly performant English-focused models do not generalize across languages, while multilingual models underutilize pretrained representations. It also remains unclear how data scale, architecture, and training objective contribute to multilingual PR. We present PhoneticXEUS -- trained on large-scale multilingual data and achieving state-of-the-art performance on both multilingual (17.7% PFER) and accented English speech (10.6% PFER). Through controlled ablations with evaluations across 100+ languages under a unified scheme, we empirically establish our training recipe and quantify the impact of SSL representations, data scale, and loss objectives. In addition, we analyze error patterns across language families, accented speech, and articulatory features. All data and code are released openly.
Comments: Submitted to Interspeech 2026. Code: this https URL
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.29042 [cs.CL]
(or arXiv:2603.29042v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.29042
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Shikhar Bharadwaj [view email] [v1] Mon, 30 Mar 2026 22:12:48 UTC (99 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingrelease‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment - Palo Alto Online
<a href="https://news.google.com/rss/articles/CBMiywFBVV95cUxOa1ZrSUQyY0JEbXEtUDFveWVUMV9SOWxZd05LM1AtOEFkc3d0QlN1X0RuSzd1RGNSM3BCN0pITlpCRUl5UmhWaWpGTXE0Q0ZWcFZqRTA2X1dEcERldk1wZnVWR2hXdGtKUDV0cmxQTVVBNVFDc1FLNXpWM3BYeEI3UE5QQWtvVmhtSmFsV0pqdF9feVhzVHRTbGtuTGNqNjJubGFNWjJ4d2lpUVFtOFA2cm1zYklkQW9vZkRDS2p3blhkaHpZWHItYlIwQQ?oc=5" target="_blank">‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment</a> <font color="#6f6f6f">Palo Alto Online</font>

Your agent's guardrails are suggestions, not enforcement
<p>Yesterday, Anthropic's Claude Code source code leaked. The entire safety system for dangerous cybersecurity work turned out to be a single text file with one instruction: <em>"Be careful not to introduce security vulnerabilities."</em></p> <p>That is the safety layer at one of the most powerful AI companies in the world. Just a prompt asking the model nicely to behave.</p> <p>This is not a shot at Anthropic. It is a symptom of something the whole industry is dealing with right now. We have confused guidance with enforcement, and as agents move into production, that distinction is starting to matter a lot.</p> <h2> Why prompt guardrails feel like they work </h2> <p>When you are building an agent in development, prompt-based guardrails seem totally reasonable. You write something like "ne

Understanding Attention Mechanisms – Part 5: How Attention Produces the First Output
<p>In the <a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-4-turning-similarity-scores-into-attention-weights-5aj2">previous article</a>, we stopped at using the <strong>softmax function to scale the scores</strong>.</p> <p>When we scale the values for the first encoded word <strong>“Let’s”</strong> by <strong>0.4</strong>:</p> <p><a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2mh2c1dzkberz4204ur.png" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2mh2c1dzkberz4204ur.p
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment - Palo Alto Online
<a href="https://news.google.com/rss/articles/CBMiywFBVV95cUxOa1ZrSUQyY0JEbXEtUDFveWVUMV9SOWxZd05LM1AtOEFkc3d0QlN1X0RuSzd1RGNSM3BCN0pITlpCRUl5UmhWaWpGTXE0Q0ZWcFZqRTA2X1dEcERldk1wZnVWR2hXdGtKUDV0cmxQTVVBNVFDc1FLNXpWM3BYeEI3UE5QQWtvVmhtSmFsV0pqdF9feVhzVHRTbGtuTGNqNjJubGFNWjJ4d2lpUVFtOFA2cm1zYklkQW9vZkRDS2p3blhkaHpZWHItYlIwQQ?oc=5" target="_blank">‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment</a> <font color="#6f6f6f">Palo Alto Online</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - wsj.com
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxPZF9qcGNvaDRzR2EwZXJFdDdtd0ZCN3VzM3lCOHJzNFdobUFVTGdLcnFBajlZOWZGcmRQdFhhOXNON1JycUw2aW9vT1hobEJ0ZVRFd0IxWmFDbXlod2hzbjRza3ZDdVBua1ZHYXRBaU0wZU1XTmFDWHF1ajlIVWxJMkRHc3pFY1BKaHRxbnB0M0RoUGo2RXZDOGt3cnQ1cm5MaGRxLWZpWHdfMzl6dkIzTXJ5VEQ2VGtnMjVFMEJxcTFrWXl0REt0WndFd0dDRUdILU8tRURVekNuaUY0dEF3UFlnYTYtcWRpOC1EWU9rbGw1Z2ZLSERtdjYzV0g2YjlmODE3aFRrZFctUGpTYzZaMHJXdllFVGdpdDBWM3BmX1FLOElZMWd1bjh0b0hxcXdWZGFfN09jd2wxY01QczdHWllEaWNqUkViMkd6RF9FYlVXWklKT3NoeW9MTThBUm96LWpEODIyQUcyM1JsY0ZyMnpDTjM5NERTQWRlaDV1UFZZcEE1M19ZLVA5YWtkU2VQLW8xelBocVdCN0Z2UnB6eG1n?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">wsj.com</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - wsj.com
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxQQmRLN1A3MFpjNU9aWFpCV2Z5aVowb0p5SnN2M0RVcE1nYmttWllNREVuY0JPX3dkTVNyaGFKelpMT0ZQVnA4VlB1dl84VDgtczFpTXpWVDhLTmk3OC1WOWF3eFNFdU9zYmxWNE5DNjI5SXJYTzkyVzZyd1djOXlSWEZuTXJlQWI0c2xPOWJXNWJqT3VoQW1lREJtZWVJbzFSQmxRSzVJX09feXJPY1VqNnlSaDlhdUw2TmdHV2NMci1pa2Y5NVhMamVMQmlsR3FzN2w5OWhkUGctQUNhNE9XbTVZTjM3Q09ZN1RlVnBZNmdwRGt0Y3h2MkxqbEdJNFZtYzRMSWQ1Z0dUU1VmcTNIdk44VEVHSk9JU2FLS3BMWVlLdEZJRnF5bHkzTEVsZHFrVXJmMzlnak4wWkJCZkE3OGw2ekh3LUpXcFdyZVh2VEpVS1Nsc2ZKcG5LREpOaFhaekpoMEJmV3JmU3RHZmthUFZ1V1pfSUdjSzNuaFZwQ2I2ZkxhY3cxT3AtUXdzVkhwUEZTZl92OHJBbnRJaU5nbWNn?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">wsj.com</font>

Your agent's guardrails are suggestions, not enforcement
<p>Yesterday, Anthropic's Claude Code source code leaked. The entire safety system for dangerous cybersecurity work turned out to be a single text file with one instruction: <em>"Be careful not to introduce security vulnerabilities."</em></p> <p>That is the safety layer at one of the most powerful AI companies in the world. Just a prompt asking the model nicely to behave.</p> <p>This is not a shot at Anthropic. It is a symptom of something the whole industry is dealing with right now. We have confused guidance with enforcement, and as agents move into production, that distinction is starting to matter a lot.</p> <h2> Why prompt guardrails feel like they work </h2> <p>When you are building an agent in development, prompt-based guardrails seem totally reasonable. You write something like "ne
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!