Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models
Full-duplex speech language models require high-quality multi-speaker conversational data, which is scarce, necessitating a robust open-source data processing pipeline to address challenges in natural dialogue dynamics and system accuracy. (2 upvotes on HuggingFace)
Published on Mar 20
Authors:
,
,
,
Abstract
Full-duplex speech language models require high-quality multi-speaker conversational data, which is scarce, necessitating a robust open-source data processing pipeline to address challenges in natural dialogue dynamics and system accuracy.
AI-generated summary
As the paradigm of AI shifts from text-based LLMs to Speech Language Models (SLMs), there is a growing demand for full-duplex systems capable of real-time, natural human-computer interaction. However, the development of such models is constrained by the scarcity of high-quality, multi-speaker conversational data, as existing large-scale resources are predominantly single-speaker or limited in volume. Addressing the complex dynamics of natural dialogue, such as overlapping and back-channeling remains a challenge, with standard processing pipelines suffering from diarization errors and ASR hallucinations. To bridge this gap, we present a robust and scalable open-source data processing pipeline designed for full-duplex model.
View arXiv page View PDF Project page GitHub 4 Add to collection
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2603.25750 in a model README.md to link it from this page.
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2603.25750 in a dataset README.md to link it from this page.
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2603.25750 in a Space README.md to link it from this page.
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivHoward University Civil Engineering Research Team Uses AI to Help Address Climate Change Crises - The Dig at Howard University
<a href="https://news.google.com/rss/articles/CBMiygFBVV95cUxOM0hQd0xMUTRsbkR6OGNfUVJJREJGTXNPQ0FQREVkdTltbzR0dUJaZjJfX21zRUdadU4tdUhOUnpNMVNFOGcxOWh5bEh1dThaTjRnaXdKSTRuNXNXU29BTHRkdU1WaUc5U0JyUndMQ3EyMGdDU3hYYW1zU2ZlanlfS0llbXJqcS1XZXZFQXBWdUI2a2hPcnQ5azRvUE5ISlBxMjc1UWRNMjlSbE84SllaMFRNLVp4MjZtcDB6S0N1Y2N5WTZWU0FNQTR3?oc=5" target="_blank">Howard University Civil Engineering Research Team Uses AI to Help Address Climate Change Crises</a> <font color="#6f6f6f">The Dig at Howard University</font>
Findings from the AI Climate Hoax: What is the real climate impact of data centres? - Finextra Research
<a href="https://news.google.com/rss/articles/CBMiwwFBVV95cUxNYlEyeXg4dVpzSC1xZzdhUHRzdkJ5VkVuRF94MlZCbVVUZ3NmaEh6NUg5OHA2a3BZd3paQk85Rlo5Tm8xT1lwUUt0WHlZeU1lckw2NjZTZEpFM2NtQnVESi1FTnNzR2duYmdfTXMzMGhraEc3ZHN2a1I3cmVnZUQ3TnhZUGFLT29oNzJxRWdVOTdVM0E5NmNBZlo5RHR6em4tdmo5NmJDRjgzZVdRNUlXMDE0U2dSTy1XVE1nMmlUU0hGT1k?oc=5" target="_blank">Findings from the AI Climate Hoax: What is the real climate impact of data centres?</a> <font color="#6f6f6f">Finextra Research</font>
UTA opens AI-driven Smart Agriculture Research Center - uta.edu
<a href="https://news.google.com/rss/articles/CBMipgFBVV95cUxPUzFsREVuMVdwd0k5dGp6M2V5bW9sWkhDZlhEdENoZFQ0NHg2c2tWVWRrbW5PZ2Z3a3RFd3dleHJPZzZxMW5mZV9JUV9FYk55bVVHcXV5UzJiOTdsV2JfVWlnZE1xdVczSVh6RGQ4c2xDWkl3SS1zakVwNDZoOWNpVGRYTUVxTzREal94dk9BVnRWRzlQMi1UODJKLWkwc2RsOVdSOFZR?oc=5" target="_blank">UTA opens AI-driven Smart Agriculture Research Center</a> <font color="#6f6f6f">uta.edu</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Oracle is cutting up to 30,000 employees to pay for AI data centres - The Next Web
<a href="https://news.google.com/rss/articles/CBMiY0FVX3lxTE5fcTM4eWtFeEtqcUUxY0ozek02THV2VElQTzcycDZ6aHg1X1NRdVgtcElJaWF2SmlOT3FHVkd4RGMwV3lUZUhTN1lWUmZtWm9RakttMG5oMktnRTQ5ODZ2X3RUQQ?oc=5" target="_blank">Oracle is cutting up to 30,000 employees to pay for AI data centres</a> <font color="#6f6f6f">The Next Web</font>
Howard University Civil Engineering Research Team Uses AI to Help Address Climate Change Crises - The Dig at Howard University
<a href="https://news.google.com/rss/articles/CBMiygFBVV95cUxOM0hQd0xMUTRsbkR6OGNfUVJJREJGTXNPQ0FQREVkdTltbzR0dUJaZjJfX21zRUdadU4tdUhOUnpNMVNFOGcxOWh5bEh1dThaTjRnaXdKSTRuNXNXU29BTHRkdU1WaUc5U0JyUndMQ3EyMGdDU3hYYW1zU2ZlanlfS0llbXJqcS1XZXZFQXBWdUI2a2hPcnQ5azRvUE5ISlBxMjc1UWRNMjlSbE84SllaMFRNLVp4MjZtcDB6S0N1Y2N5WTZWU0FNQTR3?oc=5" target="_blank">Howard University Civil Engineering Research Team Uses AI to Help Address Climate Change Crises</a> <font color="#6f6f6f">The Dig at Howard University</font>
Findings from the AI Climate Hoax: What is the real climate impact of data centres? - Finextra Research
<a href="https://news.google.com/rss/articles/CBMiwwFBVV95cUxNYlEyeXg4dVpzSC1xZzdhUHRzdkJ5VkVuRF94MlZCbVVUZ3NmaEh6NUg5OHA2a3BZd3paQk85Rlo5Tm8xT1lwUUt0WHlZeU1lckw2NjZTZEpFM2NtQnVESi1FTnNzR2duYmdfTXMzMGhraEc3ZHN2a1I3cmVnZUQ3TnhZUGFLT29oNzJxRWdVOTdVM0E5NmNBZlo5RHR6em4tdmo5NmJDRjgzZVdRNUlXMDE0U2dSTy1XVE1nMmlUU0hGT1k?oc=5" target="_blank">Findings from the AI Climate Hoax: What is the real climate impact of data centres?</a> <font color="#6f6f6f">Finextra Research</font>
UTA opens AI-driven Smart Agriculture Research Center - uta.edu
<a href="https://news.google.com/rss/articles/CBMipgFBVV95cUxPUzFsREVuMVdwd0k5dGp6M2V5bW9sWkhDZlhEdENoZFQ0NHg2c2tWVWRrbW5PZ2Z3a3RFd3dleHJPZzZxMW5mZV9JUV9FYk55bVVHcXV5UzJiOTdsV2JfVWlnZE1xdVczSVh6RGQ4c2xDWkl3SS1zakVwNDZoOWNpVGRYTUVxTzREal94dk9BVnRWRzlQMi1UODJKLWkwc2RsOVdSOFZR?oc=5" target="_blank">UTA opens AI-driven Smart Agriculture Research Center</a> <font color="#6f6f6f">uta.edu</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!