Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessTutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AISystem Instead of Team: Rethinking How Businesses Are BuiltDev.to AI10 лучших системных промптов ChatGPT: секреты успеха без опыта!Dev.to AIAI Post 4: When AI Gets It Wrong: Why AI Fails (And What That Teaches Us)Medium AIGoogle AI Overviews Are Reshaping Search — Here’s How to Get Your Business CitedDev.to AIThe $500/Month “Tool Trap” (And How Beginners Are Escaping It for Just $1)Medium AIAnthropic Accidentally Exposes Source Code for Claude Code - CNETGoogle News: ClaudeThe 4,500 Micro-Adjustment Question: Why the Best AI Still Needs a “Commander” in the Control Room.Medium AIJournal Figure Replication | Python Implementation of Sector Violin PlotsMedium AICommunity Without Tokens: What AI Dev Tools Can Learn from Crypto's Community PlaybookDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessTutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AISystem Instead of Team: Rethinking How Businesses Are BuiltDev.to AI10 лучших системных промптов ChatGPT: секреты успеха без опыта!Dev.to AIAI Post 4: When AI Gets It Wrong: Why AI Fails (And What That Teaches Us)Medium AIGoogle AI Overviews Are Reshaping Search — Here’s How to Get Your Business CitedDev.to AIThe $500/Month “Tool Trap” (And How Beginners Are Escaping It for Just $1)Medium AIAnthropic Accidentally Exposes Source Code for Claude Code - CNETGoogle News: ClaudeThe 4,500 Micro-Adjustment Question: Why the Best AI Still Needs a “Commander” in the Control Room.Medium AIJournal Figure Replication | Python Implementation of Sector Violin PlotsMedium AICommunity Without Tokens: What AI Dev Tools Can Learn from Crypto's Community PlaybookDev.to AI

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26647v1 Announce Type: new Abstract: We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. Th — Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff

View PDF HTML (experimental)

Abstract:We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms that assume constant access, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms. We derive a theoretical upper bound on the regret of our policy, characterizing the impact of both the network structure and the activation probabilities. Finally, we demonstrate through numerical simulations that UCB-LP-A significantly outperforms existing heuristics that ignore either the side-information or the availability constraints.

Subjects:

Machine Learning (cs.LG); Systems and Control (eess.SY)

Cite as: arXiv:2603.26647 [cs.LG]

(or arXiv:2603.26647v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26647

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ashutosh Soni [view email] [v1] Fri, 27 Mar 2026 17:50:42 UTC (918 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
An LP-based…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 172 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers