Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessDutchess to host artificial intelligence summit at Marist in Poughkeepsie - Daily FreemanGoogle News: AIAnthropic’s Catastrophic Leak May Have Just Handed China the Blueprints to Claude Al - TipRanksGoogle News: ClaudeMeta's AI push is reshaping how work gets done inside the companyBusiness InsiderOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up - WIREDGoogle News: OpenAIAI & Tech brief: Ireland ascendant - The Washington PostGNews AI EUPeople would rather have an Amazon warehouse in their backyard than a data centerTechCrunch AITake-Two lays off its head of AI and several team members just two months after the CEO said it was embracing Gen AI - TweakTownGoogle News: Generative AIOpenAI Buys TBPN Tech Talk Show for Enterprise Client Outreach - News and Statistics - IndexBoxGoogle News: OpenAILenovo Legion Go 2 suddenly costs $650 more as RAMageddon lays waste to gaming hardwareThe VergeResearchers Discover How to Add Psilocybin, DMT, and Other Psychedelics to TobaccoGizmodoOpenAI’s Top Executive Fidji Simo To Take Medical Leave from Company - WSJGoogle News: OpenAIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessDutchess to host artificial intelligence summit at Marist in Poughkeepsie - Daily FreemanGoogle News: AIAnthropic’s Catastrophic Leak May Have Just Handed China the Blueprints to Claude Al - TipRanksGoogle News: ClaudeMeta's AI push is reshaping how work gets done inside the companyBusiness InsiderOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up - WIREDGoogle News: OpenAIAI & Tech brief: Ireland ascendant - The Washington PostGNews AI EUPeople would rather have an Amazon warehouse in their backyard than a data centerTechCrunch AITake-Two lays off its head of AI and several team members just two months after the CEO said it was embracing Gen AI - TweakTownGoogle News: Generative AIOpenAI Buys TBPN Tech Talk Show for Enterprise Client Outreach - News and Statistics - IndexBoxGoogle News: OpenAILenovo Legion Go 2 suddenly costs $650 more as RAMageddon lays waste to gaming hardwareThe VergeResearchers Discover How to Add Psilocybin, DMT, and Other Psychedelics to TobaccoGizmodoOpenAI’s Top Executive Fidji Simo To Take Medical Leave from Company - WSJGoogle News: OpenAI
AI NEWS HUBbyEIGENVECTOREigenvector

Asymmetric Actor-Critic for Multi-turn LLM Agents

arXiv cs.CLby Shuli Jiang, Zhaoyang Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano SoattoApril 2, 20261 min read1 views
Source Quiz

arXiv:2604.00304v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based acto

View PDF

Abstract:Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based actor-critic methods, our framework supervises a fixed actor operating in open-ended conversational environments. The design leverages a generation-verification asymmetry: while high-quality generation requires large models, effective oversight can often be achieved by smaller ones. We further introduce a data generation pipeline that produces supervision signals for critic fine-tuning without modifying the actor. Experiments on $\tau$-bench and UserBench show that our approach significantly improves reliability and task success over strong single-agent baselines. Moreover, lightweight open-source critics rival or surpass larger proprietary models in the critic role, and critic fine-tuning yields additional gains over several state-of-the-art methods.

Comments: 19 pages

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2604.00304 [cs.CL]

(or arXiv:2604.00304v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.00304

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shuli Jiang [view email] [v1] Tue, 31 Mar 2026 22:56:21 UTC (1,640 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Asymmetric …modellanguage mo…trainingannounceopen-sourceapplicationarXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 158 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!