Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing
arXiv:2603.28900v1 Announce Type: new Abstract: We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adversary: with probability R, the adversary perturbs the observed state to maximally degrade each agent's safety performance. We derive a closed-form expression for this adversarial perturbation, bypassing adversarial training entirely and enabling linear-time evaluation in the state dimension. We show th
View PDF HTML (experimental)
Abstract:We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adversary: with probability R, the adversary perturbs the observed state to maximally degrade each agent's safety performance. We derive a closed-form expression for this adversarial perturbation, bypassing adversarial training entirely and enabling linear-time evaluation in the state dimension. We show that this expression approximates the true worst-case adversarial perturbation with second-order accuracy. We further bound the safety performance gap between clean and corrupted observations, showing that it degrades at most linearly with the corruption probability under Kullback-Leibler regularization. Finally, we integrate the closed-form adversarial policy into a MARL policy gradient algorithm to obtain a robust counter-policy for the agents. In a high-density sUAS simulation, we observe near-zero collision rates under corruption levels up to 35%, outperforming a baseline policy trained without adversarial perturbations.
Comments: This work has been submitted to the IEEE for possible publication
Subjects:
Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as: arXiv:2603.28900 [cs.RO]
(or arXiv:2603.28900v1 [cs.RO] for this version)
https://doi.org/10.48550/arXiv.2603.28900
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Alex Zongo [view email] [v1] Mon, 30 Mar 2026 18:26:59 UTC (686 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
trainingannouncevaluationThe evolution of LLM tool-use from API calls to agentic applications
A look at the evolution of LLM tool-use, from supervised fine-tuning to Reinforcement Learning (RLVR) and agentic applications in large and specialized models. The post The evolution of LLM tool-use from API calls to agentic applications first appeared on TechTalks .
How test-time training allows models to learn long documents instead of just caching them
By treating language modeling as a continual learning problem, the TTT-E2E architecture achieves the accuracy of full-attention Transformers on 128k context tasks while matching the speed of linear models. The post How test-time training allows models to ‘learn’ long documents instead of just caching them first appeared on TechTalks .
Agentic Coding and the Economics of Open Source
AI is rapidly transforming how software is built, shifting economic incentives from open source code and collaboration toward on-demand, personalized development through agentic coding a.k.a. vibe coding. In this episode, Chris speaks with Miklós Koren of Central European University about how AI is reshaping open source and the software industry. They explore the economics of incentives, evolving collaboration patterns, and what this shift means for software development, the future of AI, and its broader impact on the technology sector. Featuring: Miklós Koren – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Links: Vibe Coding Kills Open Source The Directions of Technical Change The Tailwind story Upcoming Events: Register for upcoming webinars here ! ]]>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
URM shows how small, recurrent models can outperform big LLMs in reasoning tasks
The key to solving complex reasoning isn't stacking more transformer layers, but refining the "thought process" through efficient recurrent loops. The post URM shows how small, recurrent models can outperform big LLMs in reasoning tasks first appeared on TechTalks .
VL-JEPA is a lean, fast vision-language model that rivals the giants
Meta’s VL-JEPA outperforms massive vision-language models on world modeling tasks by learning to predict "thought vectors" instead of text tokens. The post VL-JEPA is a lean, fast vision-language model that rivals the giants first appeared on TechTalks .
How test-time training allows models to learn long documents instead of just caching them
By treating language modeling as a continual learning problem, the TTT-E2E architecture achieves the accuracy of full-attention Transformers on 128k context tasks while matching the speed of linear models. The post How test-time training allows models to ‘learn’ long documents instead of just caching them first appeared on TechTalks .

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!