Models model announce policy agent arxiv

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

arXiv cs.MAby Rupal Nigam, Niket Parikh, Hamid Osooli, Mikihisa Yuasa, Jacob Heglund, Huy T. TranApril 1, 20261 min read0 views

Source Quiz

arXiv:2510.16187v2 Announce Type: replace Abstract: Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improve

View PDF HTML (experimental)

Abstract:Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.

Comments: 10 pages, 8 figures. To appear in proceedings of 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

Subjects:

Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Cite as: arXiv:2510.16187 [cs.MA]

(or arXiv:2510.16187v2 [cs.MA] for this version)

https://doi.org/10.48550/arXiv.2510.16187

arXiv-issued DOI via DataCite

Related DOI:

https://doi.org/10.65109/TNEX7143

DOI(s) linking to related resources

Submission history

From: Rupal Nigam [view email] [v1] Fri, 17 Oct 2025 19:55:25 UTC (1,805 KB) [v2] Tue, 31 Mar 2026 17:21:11 UTC (5,734 KB)

Original source

arXiv cs.MA

https://arxiv.org/abs/2510.16187

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannouncepolicy

ModelsLive

Anthropic cuts off the ability to use Claude subscriptions with OpenClaw and third-party AI agents - VentureBeat

Anthropic cuts off the ability to use Claude subscriptions with OpenClaw and third-party AI agents VentureBeat

Google News: Claude

1mabout 1 hour ago

ProductsLive

Considering NeurIPS submission [D]

Wondering if it worth submitting paper I’m working on to NeurIPS. I have formal mathematical proof for convergence of a novel agentic system plus a compelling application to a real world use case. The problem is I just have a couple examples. I’ve tried working with synthetic data and benchmarks but no existing benchmarks captures the complexity of the real world data for any interesting results. Is it worth submitting or should I hold on to it until I can build up more data? submitted by /u/Clean-Baseball3748 [link] [comments]

Reddit r/MachineLearning

1m37 minutes ago

Research PapersLive

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM , and I wanted to share it here because the findings are directly relevant to anyone dealing high frequency data and machine learning The core problem we solved: Every market maker's nightmare — getting picked off by informed traders right before a big move. We built a model that flags those toxic seconds before they wreck you. The data: - 31,081,463 second-level observations of BTC/USDT perpetual futures on Bybit - February 2025 → February 2026 (381 raw daily files) - Strict walk-forward regime, zero lookahead bias The key results (this is the part that shocked us): Our TailScore metric — which combines predicted toxicity probability with predicted price move severity — flags the top

Reddit r/MachineLearning

2m32 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 207 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

Submission history

Daily AI Digest

More about

Anthropic cuts off the ability to use Claude subscriptions with OpenClaw and third-party AI agents - VentureBeat

Considering NeurIPS submission [D]

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Claude Code’s Secrets Revealed - Substack

I Asked Claude Why It Won’t Stop Flattering Me - Nautilus | Science Connected

XRP Price Prediction: We Asked ChatGPT What XRP Will Be Worth If the CLARITY Act Passes - Yahoo Finance

Fake ChatGPT Ad Blocker Chrome Extension Caught Spying on Users - Hackread