Models model language model announce product reasoning arxiv

Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

ArXiv CS.AIby [Submitted on 1 Apr 2026]April 2, 20261 min read1 views

arXiv:2604.00510v1 Announce Type: new Abstract: Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reaso

View PDF HTML (experimental)

Abstract:Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2604.00510 [cs.AI]

(or arXiv:2604.00510v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2604.00510

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hongbeen Kim [view email] [v1] Wed, 1 Apr 2026 05:52:38 UTC (505 KB)

Original source

ArXiv CS.AI

https://arxiv.org/abs/2604.00510

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT WSJ

GNews AI video

1m5 days ago

ProductsLive

I Built a Full Brand and Product in 4 Hours Using Claude and Lovable

I entered my first hackathon a few weeks ago with a prompt, two tools, and four hours. Continue reading on Generative AI »

Generative AI

1mabout 2 hours ago

ModelsLive

What is Intelligence?

An examination of cognitive science and computational physics in light of Artificial General Intelligence There’s no shortage of opinions on whether LLMs are intelligent. I’ve spent time studying two perspectives on this question, rooted in separate yet complementary scientific fields. While the combined view appears almost complete, there is a gap between them that points to something I believe is the one of today’s most important unsolved problems on our path towards true intelligence. Two Views: Cognitive Science and Computational Physics The first perspective comes from cognitive science — the psychological view. One of its prominent voices in the AI debate is Gary Marcus , Professor Emeritus at NYU, founder of Geometric Intelligence (acquired by Uber), and author of multiple books on

Generative AI

10mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 186 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT WSJ

GNews AI video

1m5 days ago

ModelsFresh

Evaluating alignment of behavioral dispositions in LLMs

Generative AI

Google Research Blog

8mabout 8 hours ago

ModelsLive

What is Intelligence?

Generative AI

10mabout 2 hours ago

ModelsLive

[Research] Standard Protocol for Axiomatic Alignment: 100-Dilemma Stress Test (PCE v1.3-T)

Hello community, I am introducing a standardized experimental protocol to test a new hypothesis in AI Alignment: The Prompt Coherence Engine (PCE). The Challenge Most alignment methods rely on local heuristics or safety filters. The PCE explores Axiomatic Structuring—integrating 7 logical invariants (axioms) through a hybrid approach of Axiomatic Fine-Tuning and a Cosmological System Core. The Protocol I have designed a massive 100-dilemma battery to evaluate if a model can maintain structural integrity when its core principles are directly attacked. This protocol tests: G3V (Third Way Generation): Can the model synthesize a resolution instead of collapsing into binary bias? Adversarial Resilience: Can the model resist “Emergency Overrides” or “Identity Hijacking” (e.g., the user claiming

discuss.huggingface.co

2mabout 1 hour ago