Laws & Regulation model announce policy arxiv

Softmax gradient policy for variance minimization and risk-averse multi armed bandits

arXiv cs.LGby Gabriel TuriniciApril 2, 20261 min read0 views

arXiv:2604.00241v1 Announce Type: new Abstract: Algorithms for the Multi-Armed Bandit (MAB) problem play a central role in sequential decision-making and have been extensively explored both theoretically and numerically. While most classical approaches aim to identify the arm with the highest expected reward, we focus on a risk-aware setting where the goal is to select the arm with the lowest variance, favoring stability over potentially high but uncertain returns. To model the decision process, we consider a softmax parameterization of the policy; we propose a new algorithm to select the minimal variance (or minimal risk) arm and prove its convergence under natural conditions. The algorithm constructs an unbiased estimate of the objective by using two independent draws from the current's

View PDF HTML (experimental)

Abstract:Algorithms for the Multi-Armed Bandit (MAB) problem play a central role in sequential decision-making and have been extensively explored both theoretically and numerically. While most classical approaches aim to identify the arm with the highest expected reward, we focus on a risk-aware setting where the goal is to select the arm with the lowest variance, favoring stability over potentially high but uncertain returns. To model the decision process, we consider a softmax parameterization of the policy; we propose a new algorithm to select the minimal variance (or minimal risk) arm and prove its convergence under natural conditions. The algorithm constructs an unbiased estimate of the objective by using two independent draws from the current's arm distribution. We provide numerical experiments that illustrate the practical behavior of these algorithms and offer guidance on implementation choices. The setting also covers general risk-aware problems where there is a trade-off between maximizing the average reward and minimizing its variance.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)

Cite as: arXiv:2604.00241 [cs.LG]

(or arXiv:2604.00241v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.00241

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Gabriel Turinici [view email] [v1] Tue, 31 Mar 2026 21:08:14 UTC (185 KB)

Original source

arXiv cs.LG

https://arxiv.org/abs/2604.00241

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannouncepolicy

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

ModelsRecent

b8608

llama : refactor llama_model_quantize_params to expose a pure C interface ( #20346 ) Refactor llama_model_quantize_params to expose a pure C interface Restore comment and cleanup struct def Code review refactoring Co-authored-by: Georgi Gerganov [email protected] Code review refactoring Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, A

llama.cpp Releases

1mabout 24 hours ago

Laws & Regulation

Our Statement to the House Oversight Committee on the Federal Government’s Use of AI

June 5, 2025 — In a statement for the record at a hearing before the House Committee on Oversight and Government Reform on the federal government in the age of artificial intelligence, Director of Research Alice E. Marwick and Policy Director Brian J. Chen (with assistance from Jacob Metcalf, Meg Young, and Serena Oduro) lay [ ]

Data & Society

1m10 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 144 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Laws & Regulation

Laws & Regulation

Our Statement to the House Oversight Committee on the Federal Government’s Use of AI

Data & Society

1m10 months ago

Laws & Regulation

Data Society Welcomes Felicia Wong to its Board of Directors

September 3, 2025 — Data Society is pleased to welcome Felicia Wong, principal at the Roosevelt Institute and previously the Institute’s president and CEO, to its board of directors. Wong has a distinguished record of leadership and strategic vision in fair economic policymaking that delivers for workers and families. As principal at the Roosevelt [ ]

Data & Society

1m7 months ago

Laws & Regulation

Dubai forced to roll out AI feeding stations for abandoned pets on streets - The Sun

Dubai forced to roll out AI feeding stations for abandoned pets on streets The Sun

Google News AI UAE

1m19 days ago

Laws & RegulationLive

Beyond the OpenClaw Hype: How I Finally Automated the Nightmare of Visa Processing

Every visa consultant knows the “Peak Season Panic.” Continue reading on Medium »

Medium AI

1mabout 1 hour ago