Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessRightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch ModelsMarkTechPostChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessRightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch ModelsMarkTechPostChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Fast Best-in-Class Regret for Contextual Bandits

arXiv stat.MLby Samuel Girard, Aurelien Bibaut, Arthur Gretton, Nathan Kallus, Houssam ZenatiApril 6, 20261 min read0 views
Source Quiz

arXiv:2510.15483v2 Announce Type: replace Abstract: We study the problem of stochastic contextual bandits in the agnostic setting, where the goal is to compete with the best policy in a given class without assuming realizability or imposing model restrictions on losses or rewards. In this work, we establish the first fast rate for regret relative to the best-in-class policy. Our proposed algorithm updates the policy at every round by minimizing a pessimistic objective, defined as a clipped inverse-propensity estimate of the policy value plus a variance penalty. By leveraging entropy assumptions on the policy class and a H\"olderian error-bound condition (a generalization of the margin condition), we achieve fast best-in-class regret rates, including polylogarithmic rates in the parametric

View PDF HTML (experimental)

Abstract:We study the problem of stochastic contextual bandits in the agnostic setting, where the goal is to compete with the best policy in a given class without assuming realizability or imposing model restrictions on losses or rewards. In this work, we establish the first fast rate for regret relative to the best-in-class policy. Our proposed algorithm updates the policy at every round by minimizing a pessimistic objective, defined as a clipped inverse-propensity estimate of the policy value plus a variance penalty. By leveraging entropy assumptions on the policy class and a Hölderian error-bound condition (a generalization of the margin condition), we achieve fast best-in-class regret rates, including polylogarithmic rates in the parametric case. The analysis is driven by a sequential self-normalized maximal inequality for bounded martingale empirical processes, which yields uniform variance-adaptive confidence bounds and guarantees pessimism under adaptive data collection.

Subjects:

Machine Learning (stat.ML); Machine Learning (cs.LG)

Cite as: arXiv:2510.15483 [stat.ML]

(or arXiv:2510.15483v2 [stat.ML] for this version)

https://doi.org/10.48550/arXiv.2510.15483

arXiv-issued DOI via DataCite

Submission history

From: Houssam Zenati [view email] [v1] Fri, 17 Oct 2025 09:53:42 UTC (464 KB) [v2] Fri, 3 Apr 2026 17:49:49 UTC (671 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceupdate

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Fast Best-i…modelannounceupdateanalysisstudypolicyarXiv stat.…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 309 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Laws & Regulation