Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning
arXiv:2305.09840v4 Announce Type: replace Abstract: Balancing exploration and exploitation has been an important problem in both game tree search and automated planning. However, while the problem has been extensively analyzed within the Multi-Armed Bandit (MAB) literature, the planning community has had limited success when attempting to apply those results. We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms that are based on Monte Carlo Tree Search (MCTS) / Trial Based Heuristic Tree Search (THTS). In particular, THTS uses UCB1 — Stephen Wissow, Masataro Asai
View PDF
Abstract:Balancing exploration and exploitation has been an important problem in both game tree search and automated planning. However, while the problem has been extensively analyzed within the Multi-Armed Bandit (MAB) literature, the planning community has had limited success when attempting to apply those results. We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms that are based on Monte Carlo Tree Search (MCTS) / Trial Based Heuristic Tree Search (THTS). In particular, THTS uses UCB1 MAB algorithms in an ad hoc manner, as UCB1's theoretical requirement of fixed bounded support reward distributions is not satisfied within heuristic search for classical planning. The core issue lies in UCB1's lack of adaptations to the different scales of the rewards. We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for agile classical planning, which handles distributions with different scales by taking the reward variance into consideration, and resulted in an improved algorithmic performance (more plans found with less node expansions) that outperforms Greedy Best First Search and existing MCTS/THTS-based algorithms (GreedyUCT,GreedyUCT*).*
Comments: Outstanding paper award in ECAI 2024
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2305.09840 [cs.AI]
(or arXiv:2305.09840v4 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2305.09840
arXiv-issued DOI via DataCite
Submission history
From: Masataro Asai [view email] [v1] Tue, 16 May 2023 22:46:37 UTC (249 KB) [v2] Mon, 3 Jul 2023 20:00:03 UTC (1,744 KB) [v3] Fri, 30 Aug 2024 15:57:01 UTC (3,722 KB) [v4] Thu, 26 Mar 2026 19:23:28 UTC (3,723 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Infinite-Horizon Ergodic Control via Kernel Mean Embeddings
arXiv:2604.01023v1 Announce Type: new Abstract: This paper derives an infinite-horizon ergodic controller based on kernel mean embeddings for long-duration coverage tasks on general domains. While existing kernel-based ergodic control methods provide strong coverage guarantees on general coverage domains, their practical use has been limited to sub-ergodic, finite-time horizons due to intractable computational scaling, prohibiting its use for long-duration coverage. We resolve this scaling by deriving an infinite-horizon ergodic controller equipped with an extended kernel mean embedding error visitation state that recursively records state visitation. This extended state decouples past visitation from future control synthesis and expands ergodic control to infinite-time settings. In additi
An Integrated Soft Robotic System for Measuring Vital Signs in Search and Rescue Environments
arXiv:2604.00971v1 Announce Type: new Abstract: Robots are frequently utilized in search-and-rescue operations. In recent years, significant advancements have been made in the field of victim assessment. However, there are still open issues regarding heart rate measurement, and no studies have been found that assess pressure in post-disaster scenarios. This work designs a soft gripper and integrates it into a mobile robotic system, thereby creating a device capable of measuring the pulse and blood pressure of victims in post-disaster environments. The gripper is designed to envelop the victim's arm and inflate like a sphygmomanometer, facilitated by a specialized portability system. The utilization of different signal processing algorithms has enabled the attainment of a pulse bias of \qty
Vector researchers tackle real-world AI challenges at ICML 2025
Leading researchers from Vector are presenting cutting-edge work at this year s International Conference on Machine Learning (ICML), taking place July 13-19, 2025 in Vancouver, Canada and through virtual platforms. With [ ] The post Vector researchers tackle real-world AI challenges at ICML 2025 appeared first on Vector Institute for Artificial Intelligence .

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!