Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?Reddit r/MachineLearning🔥 roboflow/supervisionGitHub Trending🔥 Alishahryar1/free-claude-codeGitHub Trending🔥 zai-org/GLM-OCRGitHub Trending🔥 MervinPraison/PraisonAIGitHub Trending🔥 sponsors/asgeirtjGitHub TrendingHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AIClaude has Angst. What can we do?LessWrongSame Prompt. Different Answers Every Time. Here's How I Fixed It.Dev.to AICan AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global InvestorsDev.to AIYour Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation TestingDev.to AII Broke My Multi-Agent Pipeline on Purpose. All 3 Failures Were Silent.Dev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI Business[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?Reddit r/MachineLearning🔥 roboflow/supervisionGitHub Trending🔥 Alishahryar1/free-claude-codeGitHub Trending🔥 zai-org/GLM-OCRGitHub Trending🔥 MervinPraison/PraisonAIGitHub Trending🔥 sponsors/asgeirtjGitHub TrendingHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AIClaude has Angst. What can we do?LessWrongSame Prompt. Different Answers Every Time. Here's How I Fixed It.Dev.to AICan AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global InvestorsDev.to AIYour Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation TestingDev.to AII Broke My Multi-Agent Pipeline on Purpose. All 3 Failures Were Silent.Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.27884v1 Announce Type: new Abstract: We study safe reinforcement learning in finite-horizon linear mixture constrained Markov decision processes (CMDPs) with adversarial rewards under full-information feedback and an unknown transition kernel. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $\widetilde{O}(\sqrt{d^2 H^3 K})$ under mild conditions, where $d$ is the feature dimension, $H$ is the horizon, and $K$ is the number of episodes. To the best of our knowledge, this is the first provably efficient algorithm for linea — Kihyun Yu, Seoungbin Bae, Dabeen Lee

View PDF HTML (experimental)

Abstract:We study safe reinforcement learning in finite-horizon linear mixture constrained Markov decision processes (CMDPs) with adversarial rewards under full-information feedback and an unknown transition kernel. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $\widetilde{O}(\sqrt{d^2 H^3 K})$ under mild conditions, where $d$ is the feature dimension, $H$ is the horizon, and $K$ is the number of episodes. To the best of our knowledge, this is the first provably efficient algorithm for linear mixture CMDPs with adversarial rewards. In particular, our regret bound is near-optimal, matching the known minimax lower bound up to logarithmic factors. The key idea is to introduce a regularized dual update that enables a drift-based analysis. This step is essential, as strong duality-based analysis cannot be directly applied when reward functions change across episodes. In addition, we extend weighted ridge regression-based parameter estimation to the constrained setting, allowing us to construct tighter confidence intervals that are crucial for deriving the near-optimal regret bound.

Subjects:

Machine Learning (cs.LG); Optimization and Control (math.OC)

Cite as: arXiv:2603.27884 [cs.LG]

(or arXiv:2603.27884v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27884

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kihyun Yu [view email] [v1] Sun, 29 Mar 2026 21:51:33 UTC (160 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Near-Optima…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 171 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers