An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability
arXiv:2603.26647v1 Announce Type: new Abstract: We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. Th — Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff
View PDF HTML (experimental)
Abstract:We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms that assume constant access, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms. We derive a theoretical upper bound on the regret of our policy, characterizing the impact of both the network structure and the activation probabilities. Finally, we demonstrate through numerical simulations that UCB-LP-A significantly outperforms existing heuristics that ignore either the side-information or the availability constraints.
Subjects:
Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as: arXiv:2603.26647 [cs.LG]
(or arXiv:2603.26647v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.26647
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Ashutosh Soni [view email] [v1] Fri, 27 Mar 2026 17:50:42 UTC (918 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivOptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
OptiMer enables flexible continual pre-training by decoupling data mixture ratio selection from training through post-hoc Bayesian optimization of distribution vectors extracted from individual dataset models. (1 upvotes on HuggingFace)
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Discrete Native Autoregressive framework enables unified multimodal processing by representing diverse modalities in a shared discrete space through a novel visual transformer architecture. (43 upvotes on HuggingFace)
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
CARLA-Air integrates high-fidelity driving and multirotor flight simulation within a unified Unreal Engine framework, supporting joint air-ground agent modeling with photorealistic environments and multi-modal sensing capabilities. (1 upvotes on HuggingFace)
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
CARLA-Air integrates high-fidelity driving and multirotor flight simulation within a unified Unreal Engine framework, supporting joint air-ground agent modeling with photorealistic environments and multi-modal sensing capabilities. (1 upvotes on HuggingFace)
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Discrete Native Autoregressive framework enables unified multimodal processing by representing diverse modalities in a shared discrete space through a novel visual transformer architecture. (43 upvotes on HuggingFace)
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
OptiMer enables flexible continual pre-training by decoupling data mixture ratio selection from training through post-hoc Bayesian optimization of distribution vectors extracted from individual dataset models. (1 upvotes on HuggingFace)
AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing
AutoWeather4D is a 3D-aware weather editing framework that decouples geometry and illumination through a dual-pass mechanism, enabling efficient and physically accurate weather modification for autonomous driving applications. (1 upvotes on HuggingFace)
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!