Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessLoving and Hating Apple, OEM Manufacturing of AI Glasses: Can Goertek Inc. "Change Its Fate Against All Odds"? - 36Kr 36氪GNews AI manufacturingDryft: What if AI memory worked like an ecosystem instead of a filing cabinet?DEV CommunityChina's new sensor gives humanoid robot hand sense of its own posture - Interesting EngineeringGoogle News - AI roboticsWeb Scraping Tools Comparison 2026: requests vs curl_cffi vs Playwright vs ScrapyDEV CommunitySamsung SDS Highlights 'Agentic AI' as Next Phase of Supply Chain Innovation - thelec.netGNews AI agenticQualcomm Joins Korea's 'Challenge AX' Program to Support AI Startups - thelec.netGNews AI KoreaAI Is Turning Film Pitches into Proof—But Korea’s Financing Model Still Lags - KoreaTechDeskGNews AI KoreaFrom Next.js to Pareto: What Changes and What Stays the SameDEV CommunityA Quick Note on Gemma 4 Image Settings in Llama.cppDEV CommunityDoes consciousness and suffering even matter: LLMs and moral relevancelesswrong.comHow to Parse HL7 Messages with AI — Free MCP ServerDEV CommunityGHSA-QCC3-JQWP-5VH2: GHSA-qcc3-jqwp-5vh2: Unauthenticated Resource Exhaustion via LINE Webhook Handler in OpenClawDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessLoving and Hating Apple, OEM Manufacturing of AI Glasses: Can Goertek Inc. "Change Its Fate Against All Odds"? - 36Kr 36氪GNews AI manufacturingDryft: What if AI memory worked like an ecosystem instead of a filing cabinet?DEV CommunityChina's new sensor gives humanoid robot hand sense of its own posture - Interesting EngineeringGoogle News - AI roboticsWeb Scraping Tools Comparison 2026: requests vs curl_cffi vs Playwright vs ScrapyDEV CommunitySamsung SDS Highlights 'Agentic AI' as Next Phase of Supply Chain Innovation - thelec.netGNews AI agenticQualcomm Joins Korea's 'Challenge AX' Program to Support AI Startups - thelec.netGNews AI KoreaAI Is Turning Film Pitches into Proof—But Korea’s Financing Model Still Lags - KoreaTechDeskGNews AI KoreaFrom Next.js to Pareto: What Changes and What Stays the SameDEV CommunityA Quick Note on Gemma 4 Image Settings in Llama.cppDEV CommunityDoes consciousness and suffering even matter: LLMs and moral relevancelesswrong.comHow to Parse HL7 Messages with AI — Free MCP ServerDEV CommunityGHSA-QCC3-JQWP-5VH2: GHSA-qcc3-jqwp-5vh2: Unauthenticated Resource Exhaustion via LINE Webhook Handler in OpenClawDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

arXiv cs.LGby [Submitted on 31 Mar 2026]April 2, 20262 min read1 views
Source Quiz

arXiv:2604.00136v1 Announce Type: new Abstract: Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboard new models at runtime. ParetoBandit closes these gaps through three mechanisms. An online primal-dual budget pacer enforces a per-request cost ceiling over an open-ended stream, replacing offline penalty tuning with closed-loop control. Geometric fo

View PDF HTML (experimental)

Abstract:Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboard new models at runtime. ParetoBandit closes these gaps through three mechanisms. An online primal-dual budget pacer enforces a per-request cost ceiling over an open-ended stream, replacing offline penalty tuning with closed-loop control. Geometric forgetting on sufficient statistics enables rapid adaptation to price and quality shifts while bootstrapping from offline priors. A hot-swap registry lets operators add or remove models at runtime, with a brief forced-exploration phase for each newcomer, after which UCB selection discovers its quality-cost niche from live traffic alone. We evaluate ParetoBandit across four deployment scenarios on 1,824 prompts routed through a three-model portfolio. Across seven budget ceilings, mean per-request cost never exceeds the target by more than 0.4%. When conditions shift, the system adapts: an order-of-magnitude price cut on the costliest model yields up to +0.071 quality lift, and a silent quality regression is detected and rerouted within budget. A cold-started model reaches meaningful adoption within ~142 steps without breaching the cost ceiling. The router discriminates rather than blindly adopting: expensive models are budget-gated and low-quality models rejected after bounded exploration. End-to-end routing latency is 9.8ms on CPU -- less than 0.4% of typical inference time -- with the routing decision itself taking just 22.5us.

Comments: 27 pages, 15 figures, 13 tables. Code available at this https URL

Subjects:

Machine Learning (cs.LG); Computation and Language (cs.CL)

MSC classes: 68T05, 62L05

ACM classes: I.2.6; I.2.11; C.4

Cite as: arXiv:2604.00136 [cs.LG]

(or arXiv:2604.00136v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.00136

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Annette Taberner-Miller [view email] [v1] Tue, 31 Mar 2026 18:41:53 UTC (6,181 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ParetoBandi…modelannouncenew modelopen-sourceproductarxivarXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 179 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!