Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Known Intents, New Combinations: Clause-Factorized Decoding for Compositional Multi-Intent Detection

arXiv cs.CLby Abhilash NandyApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.28929v1 Announce Type: new Abstract: Multi-intent detection papers usually ask whether a model can recover multiple intents from one utterance. We ask a harder and, for deployment, more useful question: can it recover new combinations of familiar intents? Existing benchmarks only weakly test this, because train and test often share the same broad co-occurrence patterns. We introduce CoMIX-Shift, a controlled benchmark built to stress compositional generalization in multi-intent detection through held-out intent pairs, discourse-pattern shift, longer and noisier wrappers, held-out clause templates, and zero-shot triples. We also present ClauseCompose, a lightweight decoder trained only on singleton intents, and compare it to whole-utterance baselines including a fine-tuned tiny B

View PDF HTML (experimental)

Abstract:Multi-intent detection papers usually ask whether a model can recover multiple intents from one utterance. We ask a harder and, for deployment, more useful question: can it recover new combinations of familiar intents? Existing benchmarks only weakly test this, because train and test often share the same broad co-occurrence patterns. We introduce CoMIX-Shift, a controlled benchmark built to stress compositional generalization in multi-intent detection through held-out intent pairs, discourse-pattern shift, longer and noisier wrappers, held-out clause templates, and zero-shot triples. We also present ClauseCompose, a lightweight decoder trained only on singleton intents, and compare it to whole-utterance baselines including a fine-tuned tiny BERT model. Across three random seeds, ClauseCompose reaches 95.7 exact match on unseen intent pairs, 93.9 on discourse-shifted pairs, 62.5 on longer/noisier pairs, 49.8 on held-out templates, and 91.1 on unseen triples. WholeMultiLabel reaches 81.4, 55.7, 18.8, 15.5, and 0.0; the BERT baseline reaches 91.5, 77.6, 48.9, 11.0, and 0.0. We also add a 240-example manually authored SNIPS-style compositional set with five held-out pairs; there, ClauseCompose reaches 97.5 exact match on unseen pairs and 86.7 under connector shift, compared with 41.3 and 10.4 for WholeMultiLabel. The results suggest that multi-intent detection needs more compositional evaluation, and that simple factorization goes surprisingly far once evaluation asks for it.

Comments: 6 pages, 3 tables

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.28929 [cs.CL]

(or arXiv:2603.28929v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.28929

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Abhilash Nandy [view email] [v1] Mon, 30 Mar 2026 19:06:54 UTC (23 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Known Inten…modelbenchmarkannouncevaluationpaperarxivarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 201 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models