Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessSector Snapshot: Venture Funding To Foundational AI Startups In Q1 Was Double All Of 2025 - Crunchbase NewsGNews AI startupsAnthropic is learning that there are no take-backs on the internetBusiness InsiderOpenClaw launches an official China mirror, with ByteDance providing the servers to host the Chinese-language service, as OpenClaw explodes in the country (Juro Osawa/The Information)TechmemeYouTube Topic Insights: Google's open-source Gemini tool that finds trends for you - PPC LandGNews AI open sourceArtificial Intelligence in Process Control - The Chemical EngineerGoogle News: AIOpenAI doesn’t just want to answer your questions — it wants to run your digital life - TechRadarGoogle News: OpenAIWhy Nvidia just poured $2 billion into AI ASIC competitor Marvell — NVLink Fusion turns into soft ecosystem lock-intomshardware.comIs AI the new “Manhattan Project”? Vox went to Los Alamos to find out. - VoxGoogle News: ChatGPT'Users Should Own Their AI Agents, Not Rent Them' — Valory CEO David Minarsch Explains the Future of AI Control - CCN.comGoogle News: Generative AIBest Video Conferencing Solution for Enterprises in 2026Dev.to AIFunctional Testing vs Reality: What Actually Breaks in ProductionDev.to AIGenerative AI In Manufacturing Market to hit USD 10,540.1 Million by 2033 - vocal.mediaGoogle News: Generative AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessSector Snapshot: Venture Funding To Foundational AI Startups In Q1 Was Double All Of 2025 - Crunchbase NewsGNews AI startupsAnthropic is learning that there are no take-backs on the internetBusiness InsiderOpenClaw launches an official China mirror, with ByteDance providing the servers to host the Chinese-language service, as OpenClaw explodes in the country (Juro Osawa/The Information)TechmemeYouTube Topic Insights: Google's open-source Gemini tool that finds trends for you - PPC LandGNews AI open sourceArtificial Intelligence in Process Control - The Chemical EngineerGoogle News: AIOpenAI doesn’t just want to answer your questions — it wants to run your digital life - TechRadarGoogle News: OpenAIWhy Nvidia just poured $2 billion into AI ASIC competitor Marvell — NVLink Fusion turns into soft ecosystem lock-intomshardware.comIs AI the new “Manhattan Project”? Vox went to Los Alamos to find out. - VoxGoogle News: ChatGPT'Users Should Own Their AI Agents, Not Rent Them' — Valory CEO David Minarsch Explains the Future of AI Control - CCN.comGoogle News: Generative AIBest Video Conferencing Solution for Enterprises in 2026Dev.to AIFunctional Testing vs Reality: What Actually Breaks in ProductionDev.to AIGenerative AI In Manufacturing Market to hit USD 10,540.1 Million by 2033 - vocal.mediaGoogle News: Generative AI
AI NEWS HUBbyEIGENVECTOREigenvector

Softmax gradient policy for variance minimization and risk-averse multi armed bandits

arXiv cs.LGby Gabriel TuriniciApril 2, 20261 min read0 views
Source Quiz

arXiv:2604.00241v1 Announce Type: new Abstract: Algorithms for the Multi-Armed Bandit (MAB) problem play a central role in sequential decision-making and have been extensively explored both theoretically and numerically. While most classical approaches aim to identify the arm with the highest expected reward, we focus on a risk-aware setting where the goal is to select the arm with the lowest variance, favoring stability over potentially high but uncertain returns. To model the decision process, we consider a softmax parameterization of the policy; we propose a new algorithm to select the minimal variance (or minimal risk) arm and prove its convergence under natural conditions. The algorithm constructs an unbiased estimate of the objective by using two independent draws from the current's

View PDF HTML (experimental)

Abstract:Algorithms for the Multi-Armed Bandit (MAB) problem play a central role in sequential decision-making and have been extensively explored both theoretically and numerically. While most classical approaches aim to identify the arm with the highest expected reward, we focus on a risk-aware setting where the goal is to select the arm with the lowest variance, favoring stability over potentially high but uncertain returns. To model the decision process, we consider a softmax parameterization of the policy; we propose a new algorithm to select the minimal variance (or minimal risk) arm and prove its convergence under natural conditions. The algorithm constructs an unbiased estimate of the objective by using two independent draws from the current's arm distribution. We provide numerical experiments that illustrate the practical behavior of these algorithms and offer guidance on implementation choices. The setting also covers general risk-aware problems where there is a trade-off between maximizing the average reward and minimizing its variance.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)

Cite as: arXiv:2604.00241 [cs.LG]

(or arXiv:2604.00241v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.00241

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Gabriel Turinici [view email] [v1] Tue, 31 Mar 2026 21:08:14 UTC (185 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannouncepolicy

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Softmax gra…modelannouncepolicyarxivarXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 144 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!