Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessIran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle EastTechRepublic AI‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age wellThe Guardian AIMastering the art of no in generative AI projects - FinTech GlobalGoogle News: Generative AISources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)TechmemeBrain implants let paralyzed man make music with his thoughtsTechSpotAI Guardrails by Zapier Gives Teams Inline Safety Checks for Every AI-Powered Workflow - citybizGoogle News: AI SafetySource: AWS' operation in Bahrain was damaged after an Iranian strike; Bahrain earlier said the civil defence force was "extinguishing a fire in a facility" (Financial Times)TechmemeAnthropic Accidentally Leaks Claude Source Code - BenzingaGoogle News: ClaudeThe Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code - FuturismGoogle News: ClaudeAOC Reportedly Says She Will Vote Against All Military Aid To Israel, Including Defensive WeaponsInternational Business TimesTop Artificial Intelligence Speakers for Events | Scott Steinberg - futuristsspeakers.comGoogle News: AIThese Raspberry Pi price hikes are no jokeThe Verge AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessIran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle EastTechRepublic AI‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age wellThe Guardian AIMastering the art of no in generative AI projects - FinTech GlobalGoogle News: Generative AISources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)TechmemeBrain implants let paralyzed man make music with his thoughtsTechSpotAI Guardrails by Zapier Gives Teams Inline Safety Checks for Every AI-Powered Workflow - citybizGoogle News: AI SafetySource: AWS' operation in Bahrain was damaged after an Iranian strike; Bahrain earlier said the civil defence force was "extinguishing a fire in a facility" (Financial Times)TechmemeAnthropic Accidentally Leaks Claude Source Code - BenzingaGoogle News: ClaudeThe Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code - FuturismGoogle News: ClaudeAOC Reportedly Says She Will Vote Against All Military Aid To Israel, Including Defensive WeaponsInternational Business TimesTop Artificial Intelligence Speakers for Events | Scott Steinberg - futuristsspeakers.comGoogle News: AIThese Raspberry Pi price hikes are no jokeThe Verge AI

Secure Reinforcement Learning: On Model-Free Detection of Man in the Middle Attacks

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.27592v1 Announce Type: cross Abstract: We consider the problem of learning-based man-in-the-middle (MITM) attacks in cyber-physical systems (CPS), and extend our previously proposed Bellman Deviation Detection (BDD) framework for model-free reinforcement learning (RL). We refine the standard MDP attack model by allowing the reward function to depend on both the current and subsequent states, thereby capturing reward variations induced by errors in the adversary's transition estimate. We also derive an optimal system-identification strategy for the adversary that minimizes detectable — Rishi Rani, Massimo Franceschetti

View PDF HTML (experimental)

Abstract:We consider the problem of learning-based man-in-the-middle (MITM) attacks in cyber-physical systems (CPS), and extend our previously proposed Bellman Deviation Detection (BDD) framework for model-free reinforcement learning (RL). We refine the standard MDP attack model by allowing the reward function to depend on both the current and subsequent states, thereby capturing reward variations induced by errors in the adversary's transition estimate. We also derive an optimal system-identification strategy for the adversary that minimizes detectable value deviations. Further, we prove that the agent's asymptotic learning time required to secure the system scales linearly with the adversary's learning time, and that this matches the optimal lower bound. Hence, the proposed detection scheme is order-optimal in detection efficiency. Finally, we extend the framework to asynchronous and intermittent attack scenarios, where reliable detection is preserved.

Subjects:

Systems and Control (eess.SY); Machine Learning (cs.LG)

Cite as: arXiv:2603.27592 [eess.SY]

(or arXiv:2603.27592v1 [eess.SY] for this version)

https://doi.org/10.48550/arXiv.2603.27592

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Rishi Rani [view email] [v1] Sun, 29 Mar 2026 09:18:00 UTC (344 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Secure Rein…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers