Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic’s Claude Code Leak Exposed AI’s Ugliest Weakness [TK]Medium AIWhat Claude Code’s Leaked Permission Classifier Misses — And What Fills the GapMedium AIAI DATA CENTERS ARE CREATING HEAT ISLANDS AND WARMING SURROUNDING LANDMedium AI20 Careers that Will Dominate the Next 10 Years…Medium AI30 ChatGPT Prompts That Actually Work for Sales Reps (Copy & Paste Ready)Dev.to AI【営業マン向け】ChatGPTで商談前の準備を10分で完結する方法Dev.to AI“Actions and Consequences” (With the added detailed explanation of my writing by Gemini 3.1)Medium AIClaude Code Skills Have a Model Field. Here's Why You Should Be Using It.Dev.to AIHow to Build a Professional AI Agent with EClaw: Identity, Rules, and SoulDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agents Can Talk. They Just Can't Find Each Other.Dev.to AILG Chairman Kwang Mo Koo Visits U.S. ESS Hub 'Vertech,' Emphasizing Leadership in AI-Era Energy Infrastructure - MorningstarGNews AI USABlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic’s Claude Code Leak Exposed AI’s Ugliest Weakness [TK]Medium AIWhat Claude Code’s Leaked Permission Classifier Misses — And What Fills the GapMedium AIAI DATA CENTERS ARE CREATING HEAT ISLANDS AND WARMING SURROUNDING LANDMedium AI20 Careers that Will Dominate the Next 10 Years…Medium AI30 ChatGPT Prompts That Actually Work for Sales Reps (Copy & Paste Ready)Dev.to AI【営業マン向け】ChatGPTで商談前の準備を10分で完結する方法Dev.to AI“Actions and Consequences” (With the added detailed explanation of my writing by Gemini 3.1)Medium AIClaude Code Skills Have a Model Field. Here's Why You Should Be Using It.Dev.to AIHow to Build a Professional AI Agent with EClaw: Identity, Rules, and SoulDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agents Can Talk. They Just Can't Find Each Other.Dev.to AILG Chairman Kwang Mo Koo Visits U.S. ESS Hub 'Vertech,' Emphasizing Leadership in AI-Era Energy Infrastructure - MorningstarGNews AI USA
AI NEWS HUBbyEIGENVECTOREigenvector

Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26246v1 Announce Type: cross Abstract: Standard LLM-based speech recognition systems typically process utterances in isolation, limiting their ability to leverage conversational context. In this work, we study whether multimodal context from prior turns improves LLM-based ASR and how to represent that context efficiently. We find that, after supervised multi-turn training, conversational context mainly helps with the recognition of contextual entities. However, conditioning on raw context is expensive because the prior-turn audio token sequence grows rapidly with conversation length — Shashi Kumar, Esa\'u Villatoro-Tello, Sergio Burdisso, Kadri Hacioglu, Thibault Ba\~neras-Roux, Hasindri Watawana, Dairazalia Sanchez-Cortes, Srikanth Madikeri, Petr Motlicek, Andreas Stolcke

View PDF HTML (experimental)

Abstract:Standard LLM-based speech recognition systems typically process utterances in isolation, limiting their ability to leverage conversational context. In this work, we study whether multimodal context from prior turns improves LLM-based ASR and how to represent that context efficiently. We find that, after supervised multi-turn training, conversational context mainly helps with the recognition of contextual entities. However, conditioning on raw context is expensive because the prior-turn audio token sequence grows rapidly with conversation length. To address this, we propose Abstract Compression, which replaces the audio portion of prior turns with a fixed number of learned latent tokens while retaining corresponding transcripts explicitly. On both in-domain and out-of-domain test sets, the compressed model recovers part of the gains of raw-context conditioning with a smaller prior-turn audio footprint. We also provide targeted analyses of the compression setup and its trade-offs.

Comments: 11 pages

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Cite as: arXiv:2603.26246 [cs.CL]

(or arXiv:2603.26246v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26246

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shashi Kumar [view email] [v1] Fri, 27 Mar 2026 10:09:30 UTC (1,818 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Distilling …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 164 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!