JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems
arXiv:2603.26515v1 Announce Type: cross Abstract: Despite recent advances, efficient and robust turn-taking detection remains a significant challenge in industrial-grade Voice AI agent deployments. Many existing systems rely solely on acoustic or semantic cues, leading to suboptimal accuracy and stability, while recent attempts to endow large language models with full-duplex capabilities require costly full-duplex data and incur substantial training and deployment overheads, limiting real-time performance. In this paper, we propose JAL-Turn, a lightweight and efficient speech-only turn-taking — Guangzhao Yang, Yu Pan, Shi Qiu, Ningjie Bai
View PDF HTML (experimental)
Abstract:Despite recent advances, efficient and robust turn-taking detection remains a significant challenge in industrial-grade Voice AI agent deployments. Many existing systems rely solely on acoustic or semantic cues, leading to suboptimal accuracy and stability, while recent attempts to endow large language models with full-duplex capabilities require costly full-duplex data and incur substantial training and deployment overheads, limiting real-time performance. In this paper, we propose JAL-Turn, a lightweight and efficient speech-only turn-taking framework that adopts a joint acoustic-linguistic modeling paradigm, in which a cross-attention module adaptively integrates pre-trained acoustic representations with linguistic features to support low-latency prediction of hold vs shift states. By sharing a frozen ASR encoder, JAL-Turn enables turn-taking prediction to run fully in parallel with speech recognition, introducing no additional end-to-end latency or computational overhead. In addition, we introduce a scalable data construction pipeline that automatically derives reliable turn-taking labels from large-scale real-world dialogue corpora. Extensive experiments on public multilingual benchmarks and an in-house Japanese customer-service dataset show that JAL-Turn consistently outperforms strong state-of-the-art baselines in detection accuracy while maintaining superior real-time performance.
Comments: 8 pages, in porgress
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.26515 [cs.CL]
(or arXiv:2603.26515v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.26515
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yu Pan [view email] [v1] Fri, 27 Mar 2026 15:25:38 UTC (1,136 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivUganda To Host Climate Change, Artificial Intelligence Summit, Sept 5-6 - Independent Newspaper Nigeria
<a href="https://news.google.com/rss/articles/CBMimAFBVV95cUxNcnBtdldJUERlX0dzOTJEY2sybEc2ZjZSbUtiLWIzUUhJbkQ1N3BwUWlCcV95YmZNSmFGbFQ1enE5VWJlY0JBWDhlSENlNEFNMmM5Q0hrM080V3Q2eUF3cmpkeFBXRS01YXBpRUI4Uk5KOVY5bjFaRm1GNmVudGUtNTFmVDlBMDIyNGVGaF9WTkdHTDMxY1BZcw?oc=5" target="_blank">Uganda To Host Climate Change, Artificial Intelligence Summit, Sept 5-6</a> <font color="#6f6f6f">Independent Newspaper Nigeria</font>
AI could transform research assessment — and some academics are worried - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE12VmJ3THU1WmwzcENmWFJqTVRfclJGVkhzTG9Kcm9mTm1VZnJsV2IyZGwtc21EWnZRSkRfSXM3SDRlOVZnUlhpVm9VUEMtRWRRYmNDVU1kdHg5NllvSERj?oc=5" target="_blank">AI could transform research assessment — and some academics are worried</a> <font color="#6f6f6f">Nature</font>
Instrument maker Roland launches AI melody generator powered by research from Sony Computer Science Laboratories - Music Business Worldwide
<a href="https://news.google.com/rss/articles/CBMi5wFBVV95cUxQaW5rU25RUmwtd01xd0xKRVlDWEx6b204MFYzM3FHQlBXeE5wYzhYczVGdm1HOS03VjVURE02YzBGcE8yYTRzbk1IX3AtVlJmeUVaazlVQWduNnYxN05mamVYVGNmNGdFOVRxbTRhV3hqamhfY1JNSTdsTTB1U2Nic2lNcnd2YVpFMUY5YmlyWVZFY1FQTGd3dndCS3R6Zmt3QWVnWm14WFdVeUNFd0Y0a1FQU1ZLT2psSVRxeWQ0X0FaSGhxQU5UbjZBT1JGWDZERmRRV1c1VEU0RkNkZF9HLWZyXzFxUmc?oc=5" target="_blank">Instrument maker Roland launches AI melody generator powered by research from Sony Computer Science Laboratories</a> <font color="#6f6f6f">Music Business Worldwide</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
AI could transform research assessment — and some academics are worried - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE12VmJ3THU1WmwzcENmWFJqTVRfclJGVkhzTG9Kcm9mTm1VZnJsV2IyZGwtc21EWnZRSkRfSXM3SDRlOVZnUlhpVm9VUEMtRWRRYmNDVU1kdHg5NllvSERj?oc=5" target="_blank">AI could transform research assessment — and some academics are worried</a> <font color="#6f6f6f">Nature</font>

As AI-Generated Music Advances, Humans Still Lead in Creativity, CMU Research Finds
<p> <img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/251104A_WTM_AI-Creativity-Music102.jpg.webp?itok=uEc2ayOO" width="900" height="508" alt="A woman with long black hair is seated on the right opposite a computer screen with a small piano keyboard and computer keyboard in front of her on a desk, where a man next to her with glasses and wavy black hair operates the mouse and talks to her."> </p> AI can write songs, but still has a way to go before matching the creativity of tunes made by people, according to Carnegie Mellon University research.


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!