Models model language model training announce open-source application

Asymmetric Actor-Critic for Multi-turn LLM Agents

arXiv cs.CLby Shuli Jiang, Zhaoyang Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano SoattoApril 2, 20261 min read1 views

arXiv:2604.00304v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based acto

View PDF

Abstract:Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based actor-critic methods, our framework supervises a fixed actor operating in open-ended conversational environments. The design leverages a generation-verification asymmetry: while high-quality generation requires large models, effective oversight can often be achieved by smaller ones. We further introduce a data generation pipeline that produces supervision signals for critic fine-tuning without modifying the actor. Experiments on $\tau$-bench and UserBench show that our approach significantly improves reliability and task success over strong single-agent baselines. Moreover, lightweight open-source critics rival or surpass larger proprietary models in the critic role, and critic fine-tuning yields additional gains over several state-of-the-art methods.

Comments: 19 pages

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2604.00304 [cs.CL]

(or arXiv:2604.00304v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.00304

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shuli Jiang [view email] [v1] Tue, 31 Mar 2026 22:56:21 UTC (1,640 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2604.00304

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltraining

ReleasesLive

Microsoft just shipped the clearest signal yet that it is building an AI empire without OpenAI

Six months after renegotiating the contract that once barred it from independently pursuing frontier AI, Microsoft has released three in-house models that directly challenge the partner it spent $13 billion cultivating. MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 are now available in Microsoft Foundry, and they do not carry OpenAI’s name anywhere on the label. The models are [ ] This story continues at The Next Web

The Next Web Neural

5mabout 1 hour ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m3 days ago

Models

The “Super Weight:” How Even a Single Parameter can Determine a Large Language Model’s Behavior - Apple Machine Learning Research

The “Super Weight:” How Even a Single Parameter can Determine a Large Language Model’s Behavior Apple Machine Learning Research

Google News: LLM

1m8 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 158 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Asymmetric Actor-Critic for Multi-turn LLM Agents

Submission history

Daily AI Digest

More about

Microsoft just shipped the clearest signal yet that it is building an AI empire without OpenAI

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

The “Super Weight:” How Even a Single Parameter can Determine a Large Language Model’s Behavior - Apple Machine Learning Research

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

The “Super Weight:” How Even a Single Parameter can Determine a Large Language Model’s Behavior - Apple Machine Learning Research

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ