Research Papers research paper arxiv ai artificial-intelligence

Extreme Value Monte Carlo Tree Search for Classical Planning

arXivMarch 30, 202610 min read0 views

arXiv:2405.18248v3 Announce Type: replace Abstract: Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandits (MABs) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, which are unbounded in $\R$, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal ban — Masataro Asai, Stephen Wissow

View PDF HTML (experimental)

Abstract:Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandits (MABs) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, which are unbounded in $\R$, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as $(-\infty,\infty)$, which we can narrow down. Second, Full Bellman backup (Schulte and Keller 2014), which backpropagates sample max/min, lacks theoretical justification. We use \emph{Peaks-Over-Threashold Extreme Value Theory} to resolve both issues at once, and propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical planning.

Comments: Accepted in AAAI-26. arXiv admin note: substantial text overlap with arXiv:2305.09840 (background section)

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2405.18248 [cs.AI]

(or arXiv:2405.18248v3 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2405.18248

arXiv-issued DOI via DataCite

Submission history

From: Masataro Asai [view email] [v1] Tue, 28 May 2024 14:58:43 UTC (1,567 KB) [v2] Mon, 17 Nov 2025 16:42:45 UTC (1,858 KB) [v3] Thu, 26 Mar 2026 19:08:54 UTC (164 KB)

Original source

arXiv

https://arxiv.org/abs/2405.18248

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak - Finextra Research

<a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxQSkNxZGExOG5KR1piVXBnRTN0dkxmak84akUyc0QteDdvSFlXZVNRZzktUjRyYVNvLWlKUVI5Ulp1M0hPY3g0RU9yNmowd0xmWDBIMmxCVkVDTkVjMXRscXFaV1lGTWVXajRycklSWnA4end2NDRkckM3ZE1VenZ6ZVluMmh4LXVqWXVzMEZGY2hyMXBpdnBYYldHTzVfZ2JxT3JCYmExOFphQUlTRER6bl9waWY?oc=5" target="_blank">We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak</a> Finextra Research

GNews AI finance

1mabout 2 months ago

AI Tools

Stanford Researchers Find Thin Evidence Behind AI Classroom Tools - GovTech

<a href="https://news.google.com/rss/articles/CBMipwFBVV95cUxQYmVMLUpxaHV6R1RPY1R0WGtNLTVrQXlWTzUySzJRamxoWEdqYlptMW1lMjNWMWRuS1hhb2pVNjhpdWRxekRfclhVbl9FT3E0U1Byc18xcWd0Wm5XM1BTUlNRRWNpaFlzNVk4SDN3eW9YRkFWNlJsVXhIUWdnWmdxX3ZJQUUtcm5MSFRxNTRlZ0I1cXdnV2xHUGdRT0NaQ015Z3czV3J2Yw?oc=5" target="_blank">Stanford Researchers Find Thin Evidence Behind AI Classroom Tools</a> GovTech

GNews AI education

1m16 days ago

Research PapersLive

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.

Wired AI

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 148 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Oracle Cuts 30,000 Jobs to Fund Its AI Gamble - CX Today

<a href="https://news.google.com/rss/articles/CBMikwFBVV95cUxQTTFVNGlKYVNVVThtbUowS01MSTIzemNzV2Y4NWMtd0ItNXhxeXVtUENILXdIVHVSSnZodkFqRkdxdkhqaFo3X3VQbmdSNkdBLWlyeS1xOU01blFLa01UZ0hQMlkza1dpMVRKQk5xVmM5dUFHcURMblN6b05HTjZlZjlXeWlLZ1ROdFh3eTl6WlA1Y00?oc=5" target="_blank">Oracle Cuts 30,000 Jobs to Fund Its AI Gamble</a> CX Today

GNews AI jobs

1mabout 7 hours ago

Research Papers

We Asked 300 Finance Leaders What's Next in Fintech. Here's What They Said.: By Sergiy Fitsak - Finextra Research

GNews AI finance

1mabout 2 months ago

Research PapersLive

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.

Wired AI

1mabout 1 hour ago

Research PapersRecent

Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn - The Independent

<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxQcVVnRFpzdEtnNVFmdll6VlViUUc5aUhkSzR4Wi1zOVNOMFo2TGtBcjZLR1ZnNVdmYUlPcDNrNW9oT3YzUFFSYlJjLUlLUmtQT1pWQzFxVWRnSXZjelJpaXoxTURrZGw0OFVMc2U5SGhyOVpEMnlnVmhrQ3R6VF9teFNPLTJ0c3JaNGJJeHRaR3ZmOGRFd0FMLVQ2ZHpTMm42NGc?oc=5" target="_blank">Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn</a> The Independent

GNews AI climate

1m1 day ago