Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AI7 evals that catch “helpful” AI before it harms user trustMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMeta paused its work with AI training startup Mercor after a data breachBusiness Insider[R], 31 MILLIONS High frequency data, Light GBM worked perfectlyReddit r/MachineLearningConsidering NeurIPS submission [D]Reddit r/MachineLearningAutomate Your Handyman Pricing: The True Hourly Cost AI ForgetsDev.to AIScience Is Not a Reading ProblemMedium AIHow Antigravity AI Changed My React Workflow (In Ways I Didn’t Expect)Medium AIToken Usage Is the New RAM UsageDev.to AIStop Writing Rules for AI AgentsDev.to AIUsing AI as your therapist?Medium AIDigital Marketing Trends and the Role of AI in Modern Business StrategiesMedium AI7 evals that catch “helpful” AI before it harms user trustMedium AIThe AI Pen: Collaborating With Artificial Intelligence Without Losing Your Unique VoiceMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

arXiv cs.MAby Rupal Nigam, Niket Parikh, Hamid Osooli, Mikihisa Yuasa, Jacob Heglund, Huy T. TranApril 1, 20261 min read0 views
Source Quiz

arXiv:2510.16187v2 Announce Type: replace Abstract: Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improve

View PDF HTML (experimental)

Abstract:Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.

Comments: 10 pages, 8 figures. To appear in proceedings of 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

Subjects:

Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Cite as: arXiv:2510.16187 [cs.MA]

(or arXiv:2510.16187v2 [cs.MA] for this version)

https://doi.org/10.48550/arXiv.2510.16187

arXiv-issued DOI via DataCite

Related DOI:

https://doi.org/10.65109/TNEX7143

DOI(s) linking to related resources

Submission history

From: Rupal Nigam [view email] [v1] Fri, 17 Oct 2025 19:55:25 UTC (1,805 KB) [v2] Tue, 31 Mar 2026 17:21:11 UTC (5,734 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannouncepolicy

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Zero-Shot C…modelannouncepolicyagentarxivarXiv cs.MA

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 207 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!