Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling
arXiv:2604.00510v1 Announce Type: new Abstract: Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reaso
View PDF HTML (experimental)
Abstract:Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2604.00510 [cs.AI]
(or arXiv:2604.00510v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2604.00510
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Hongbeen Kim [view email] [v1] Wed, 1 Apr 2026 05:52:38 UTC (505 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelannounce
What is Intelligence?
An examination of cognitive science and computational physics in light of Artificial General Intelligence There’s no shortage of opinions on whether LLMs are intelligent. I’ve spent time studying two perspectives on this question, rooted in separate yet complementary scientific fields. While the combined view appears almost complete, there is a gap between them that points to something I believe is the one of today’s most important unsolved problems on our path towards true intelligence. Two Views: Cognitive Science and Computational Physics The first perspective comes from cognitive science — the psychological view. One of its prominent voices in the AI debate is Gary Marcus , Professor Emeritus at NYU, founder of Geometric Intelligence (acquired by Uber), and author of multiple books on
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

What is Intelligence?
An examination of cognitive science and computational physics in light of Artificial General Intelligence There’s no shortage of opinions on whether LLMs are intelligent. I’ve spent time studying two perspectives on this question, rooted in separate yet complementary scientific fields. While the combined view appears almost complete, there is a gap between them that points to something I believe is the one of today’s most important unsolved problems on our path towards true intelligence. Two Views: Cognitive Science and Computational Physics The first perspective comes from cognitive science — the psychological view. One of its prominent voices in the AI debate is Gary Marcus , Professor Emeritus at NYU, founder of Geometric Intelligence (acquired by Uber), and author of multiple books on
![[Research] Standard Protocol for Axiomatic Alignment: 100-Dilemma Stress Test (PCE v1.3-T)](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-robot-hand-JvPW6jsLFTCtkgtb97Kys5.webp)
[Research] Standard Protocol for Axiomatic Alignment: 100-Dilemma Stress Test (PCE v1.3-T)
Hello community, I am introducing a standardized experimental protocol to test a new hypothesis in AI Alignment: The Prompt Coherence Engine (PCE). The Challenge Most alignment methods rely on local heuristics or safety filters. The PCE explores Axiomatic Structuring—integrating 7 logical invariants (axioms) through a hybrid approach of Axiomatic Fine-Tuning and a Cosmological System Core. The Protocol I have designed a massive 100-dilemma battery to evaluate if a model can maintain structural integrity when its core principles are directly attacked. This protocol tests: G3V (Third Way Generation): Can the model synthesize a resolution instead of collapsing into binary bias? Adversarial Resilience: Can the model resist “Emergency Overrides” or “Identity Hijacking” (e.g., the user claiming




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!