Research Papers research paper arxiv machine-learning deep-learning

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

arXivMarch 30, 202610 min read0 views

arXiv:2603.26647v1 Announce Type: new Abstract: We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. Th — Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff

View PDF HTML (experimental)

Abstract:We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the more practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round. This framework models real-world systems with both structural dependencies and volatility, such as social networks where users provide side-information about their peers' preferences, yet are not always online to be queried. To address this challenge, we propose UCB-LP-A, a novel policy that leverages a Linear Programming (LP) approach to optimize exploration-exploitation trade-offs under stochastic availability. Unlike standard network bandit algorithms that assume constant access, UCB-LP-A computes an optimal sampling distribution over the realizable activation sets, ensuring that the necessary observations are gathered using only the currently active arms. We derive a theoretical upper bound on the regret of our policy, characterizing the impact of both the network structure and the activation probabilities. Finally, we demonstrate through numerical simulations that UCB-LP-A significantly outperforms existing heuristics that ignore either the side-information or the availability constraints.

Subjects:

Machine Learning (cs.LG); Systems and Control (eess.SY)

Cite as: arXiv:2603.26647 [cs.LG]

(or arXiv:2603.26647v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26647

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ashutosh Soni [view email] [v1] Fri, 27 Mar 2026 17:50:42 UTC (918 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26647

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersRecent

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

OptiMer enables flexible continual pre-training by decoupling data mixture ratio selection from training through post-hoc Bayesian optimization of distribution vectors extracted from individual dataset models. (1 upvotes on HuggingFace)

HuggingFace Papers

8m1 day ago

Research Papers

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Discrete Native Autoregressive framework enables unified multimodal processing by representing diverse modalities in a shared discrete space through a novel visual transformer architecture. (43 upvotes on HuggingFace)

HuggingFace Papers

8m3 days ago

Research PapersRecent

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

CARLA-Air integrates high-fidelity driving and multirotor flight simulation within a unified Unreal Engine framework, supporting joint air-ground agent modeling with photorealistic environments and multi-modal sensing capabilities. (1 upvotes on HuggingFace)

HuggingFace Papers

8m2 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 172 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersRecent

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

HuggingFace Papers

8m2 days ago

Research Papers

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

HuggingFace Papers

8m3 days ago

Research PapersRecent

OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

HuggingFace Papers

8m1 day ago

Research Papers

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing

AutoWeather4D is a 3D-aware weather editing framework that decouples geometry and illumination through a dual-pass mechanism, enabling efficient and physically accurate weather modification for autonomous driving applications. (1 upvotes on HuggingFace)

HuggingFace Papers

8m4 days ago