Laws & Regulation model announce update analysis study policy

Fast Best-in-Class Regret for Contextual Bandits

arXiv stat.MLby Samuel Girard, Aurelien Bibaut, Arthur Gretton, Nathan Kallus, Houssam ZenatiApril 6, 20261 min read0 views

Source Quiz

arXiv:2510.15483v2 Announce Type: replace Abstract: We study the problem of stochastic contextual bandits in the agnostic setting, where the goal is to compete with the best policy in a given class without assuming realizability or imposing model restrictions on losses or rewards. In this work, we establish the first fast rate for regret relative to the best-in-class policy. Our proposed algorithm updates the policy at every round by minimizing a pessimistic objective, defined as a clipped inverse-propensity estimate of the policy value plus a variance penalty. By leveraging entropy assumptions on the policy class and a H\"olderian error-bound condition (a generalization of the margin condition), we achieve fast best-in-class regret rates, including polylogarithmic rates in the parametric

View PDF HTML (experimental)

Abstract:We study the problem of stochastic contextual bandits in the agnostic setting, where the goal is to compete with the best policy in a given class without assuming realizability or imposing model restrictions on losses or rewards. In this work, we establish the first fast rate for regret relative to the best-in-class policy. Our proposed algorithm updates the policy at every round by minimizing a pessimistic objective, defined as a clipped inverse-propensity estimate of the policy value plus a variance penalty. By leveraging entropy assumptions on the policy class and a Hölderian error-bound condition (a generalization of the margin condition), we achieve fast best-in-class regret rates, including polylogarithmic rates in the parametric case. The analysis is driven by a sequential self-normalized maximal inequality for bounded martingale empirical processes, which yields uniform variance-adaptive confidence bounds and guarantees pessimism under adaptive data collection.

Subjects:

Machine Learning (stat.ML); Machine Learning (cs.LG)

Cite as: arXiv:2510.15483 [stat.ML]

(or arXiv:2510.15483v2 [stat.ML] for this version)

https://doi.org/10.48550/arXiv.2510.15483

arXiv-issued DOI via DataCite

Submission history

From: Houssam Zenati [view email] [v1] Fri, 17 Oct 2025 09:53:42 UTC (464 KB) [v2] Fri, 3 Apr 2026 17:49:49 UTC (671 KB)

Original source

arXiv stat.ML

https://arxiv.org/abs/2510.15483

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceupdate

ModelsLive

RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models

Writing fast GPU code is one of the most grueling specializations in machine learning engineering. Researchers from RightNow AI want to automate it entirely. The RightNow AI research team has released AutoKernel, an open-source framework that applies an autonomous LLM agent loop to GPU kernel optimization for arbitrary PyTorch models. The approach is straightforward: give [ ] The post RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models appeared first on MarkTechPost .

MarkTechPost

1m13 minutes ago

ProductsFresh

Production RAG: From Anti-Patterns to Platform Engineering

RAG is a distributed system . It becomes clear when moving beyond demos into production. It consists of independent services such as ingestion, retrieval, inference, orchestration, and observability. Each component introduces its own latency, scaling characteristics, and failure modes, making coordination, observability, and fault tolerance essential. RAG flowchart In regulated environments such as banking, these systems must also satisfy strict governance, auditability, and change-control requirements aligned with standards like SOX and PCI DSS. This article builds on existing frameworks like 12 Factor Agents (Dex Horthy)¹ and Google’s 16 Factor App² by exploring key anti-patterns and introducing the pillars required to take a typical RAG pipeline to production. I’ve included code snippet

Towards AI

12mabout 4 hours ago

ModelsFresh

Word2Vec Explained: The Moment Words Became Relations

How models first learned meaning from context — and why that changed everything In the first post, we built the base layer: Text → Tokens → Numbers → (lots of math) → Tokens → Text In the second post, we stayed with the deeper question: Once words become numbers, how does meaning not disappear? We saw that the answer is not “because numbers are magical.” The answer is this: the numbers are learned in a space that preserves relationships. That was the real story of embeddings. Now we are ready for the next step. Because once you accept that words can become numbers without losing meaning, the next question becomes unavoidable: How are those numbers actually learned? This is where Word2Vec enters the story. And Word2Vec matters for more than historical reasons. It was not just a clever neura

Towards AI

16mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 309 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Laws & Regulation

Laws & Regulation

Don’t Believe the Hype: Government Regulation of AI Continues to Advance - skadden.com

Don’t Believe the Hype: Government Regulation of AI Continues to Advance skadden.com

GNews AI USA

1m3 months ago

Laws & Regulation

X could face ban in UK over deepfakes, minister says - BBC

X could face ban in UK over deepfakes, minister says BBC

GNews AI UK

1m3 months ago

Laws & RegulationFresh

Reflections on the largest AI safety protest in US history

On a sunny Saturday afternoon two weeks ago, I was sitting in Dolores park, watching a man get turned into a cake. It was, I gather, his birthday and for reasons (Maybe something to do with Scandanavia?) his friends had decided to celebrate by taping him to a tree and dousing him with all manner of liquids and powders. At the end, confetti flew everywhere. It was hard not to notice, and hard not to watch. Something about the vibe was inspiring… I felt like maybe we should be doing something like that. I was there celebrating with another fifty or so people from the Stop the AI Race protest march we had just completed, along with another hundred or so others. 1 We were marching, chanting, etc. to tell the AI company CEOs to say the obvious thing they should be shouting from the rooftops: “A

LessWrong AI

5mabout 2 hours ago

$Power one sequential tests exist for weakly compact $\mathscr P$ against $\mathscr P^c$$

Laws & RegulationFresh

Power one sequential tests exist for weakly compact $\mathscr P$ against $\mathscr P^c$

arXiv:2604.03218v1 Announce Type: cross Abstract: Suppose we observe data from a distribution $P$ and we wish to test the composite null hypothesis that $P\in\mathscr P$ against a composite alternative $P\in \mathscr Q\subseteq \mathscr P^c$. Herbert Robbins and coauthors pointed out around 1970 that, while no batch test can have a level $\alpha\in(0,1)$ and power equal to one, sequential tests can be constructed with this fantastic property. Since then, and especially in the last decade, a plethora of sequential tests have been developed for a wide variety of settings. However, the literature has not yet provided a clean and general answer as to when such power-one sequential tests exist. This paper provides a remarkably general sufficient condition (that we also prove is not necessary).

arXiv stat.ML

1mabout 5 hours ago