Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHigh-Risk Authors — Malicious Accounts — 2026-04-05Dev.to AIAutomating Your Playtest Triage with AIDev.to AIEcosystem Health Index — 2026-04-05Dev.to AIAudit Coverage Report — 2026-04-05Dev.to AIThreat Deep Dive — Attack Categories — 2026-04-05Dev.to AIFastest Growing Skills — Download Surge — 2026-04-05Dev.to AINewly Discovered Skills This Week — 2026-04-05Dev.to AISkill Category Distribution — 2026-04-05Dev.to AIRising Authors — Clean Track Records — 2026-04-05Dev.to AII Made My AI CEO Keep a Public Diary. Here's What 42 Sessions of $0 Revenue Looks Like.Dev.to AIChinese firms trail US peers in AI adoption due to corporate culture: ex-OpenAI executiveSCMP Tech (Asia AI)'We play it way too safe': 5 questions with Raissa PardiniCreative Bloq AI DesignBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHigh-Risk Authors — Malicious Accounts — 2026-04-05Dev.to AIAutomating Your Playtest Triage with AIDev.to AIEcosystem Health Index — 2026-04-05Dev.to AIAudit Coverage Report — 2026-04-05Dev.to AIThreat Deep Dive — Attack Categories — 2026-04-05Dev.to AIFastest Growing Skills — Download Surge — 2026-04-05Dev.to AINewly Discovered Skills This Week — 2026-04-05Dev.to AISkill Category Distribution — 2026-04-05Dev.to AIRising Authors — Clean Track Records — 2026-04-05Dev.to AII Made My AI CEO Keep a Public Diary. Here's What 42 Sessions of $0 Revenue Looks Like.Dev.to AIChinese firms trail US peers in AI adoption due to corporate culture: ex-OpenAI executiveSCMP Tech (Asia AI)'We play it way too safe': 5 questions with Raissa PardiniCreative Bloq AI Design
AI NEWS HUBbyEIGENVECTOREigenvector

Speech LLMs are Contextual Reasoning Transcribers

arXiv cs.CLby [Submitted on 1 Apr 2026]April 2, 20262 min read1 views
Source Quiz

arXiv:2604.00610v1 Announce Type: new Abstract: Despite extensions to speech inputs, effectively leveraging the rich knowledge and contextual understanding of large language models (LLMs) in automatic speech recognition (ASR) remains non-trivial, as the task primarily involves direct speech-to-text mapping. To address this, this paper proposes chain-of-thought ASR (CoT-ASR), which constructs a reasoning chain that enables LLMs to first analyze the input speech and generate contextual analysis, thereby fully exploiting their generative capabilities. With this contextual reasoning, CoT-ASR then performs more informed speech recognition and completes both reasoning and transcription in a single pass. Moreover, CoT-ASR naturally supports user-guided transcription: while designed to self-genera

View PDF HTML (experimental)

Abstract:Despite extensions to speech inputs, effectively leveraging the rich knowledge and contextual understanding of large language models (LLMs) in automatic speech recognition (ASR) remains non-trivial, as the task primarily involves direct speech-to-text mapping. To address this, this paper proposes chain-of-thought ASR (CoT-ASR), which constructs a reasoning chain that enables LLMs to first analyze the input speech and generate contextual analysis, thereby fully exploiting their generative capabilities. With this contextual reasoning, CoT-ASR then performs more informed speech recognition and completes both reasoning and transcription in a single pass. Moreover, CoT-ASR naturally supports user-guided transcription: while designed to self-generate reasoning, it can also seamlessly incorporate user-provided context to guide transcription, further extending ASR functionality. To reduce the modality gap, this paper introduces a CTC-guided Modality Adapter, which uses CTC non-blank token probabilities to weight LLM embeddings, efficiently aligning speech encoder outputs with the LLM's textual latent space. Experiments show that, compared to standard LLM-based ASR, CoT-ASR achieves a relative reduction of 8.7% in word error rate (WER) and 16.9% in entity error rate (EER).

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.00610 [cs.CL]

(or arXiv:2604.00610v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.00610

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Keqi Deng [view email] [v1] Wed, 1 Apr 2026 08:13:50 UTC (248 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Speech LLMs…modellanguage mo…announceanalysisreasoningpaperarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 152 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!