Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business‘I’m not dumb’: Hong Kong’s London trade office manager denies running spy networkSCMP Tech (Asia AI)ciflow/torchtitan/178947: Update on "add API to check if a tensor is symm-mem-tensor"PyTorch ReleasesGoogle Panda Algorithm: Understanding Its Impact and How to Recover from Its ConsequencesDev.to AIComplete Guide to llm-d CNCF Sandbox — Kubernetes-Native Distributed LLM InferenceDev.to AIciflow/trunk/178016: simplify testPyTorch Releasesciflow/torchtitan/178016: simplify testPyTorch ReleasesI Built an AI Coloring Page Generator — Got 500+ Organic Visits in One DayDev.to AIHeated Rivalry: A Guide to the Best Books, Movies, Video Games, and Podcasts for Fans of the Hit SeriesDev.to AIWe're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openlyDev.to AIciflow/torchtitan/177627: UpdatePyTorch Releasesciflow/torchtitan/177621: UpdatePyTorch Releasestrunk/d52b2f548aa3cfcfcd499fcba764fccf29628de6: [inductor] Enable precompiled headers in fbcode (#178870) (#178870)PyTorch ReleasesBlack Hat USAAI BusinessBlack Hat AsiaAI Business‘I’m not dumb’: Hong Kong’s London trade office manager denies running spy networkSCMP Tech (Asia AI)ciflow/torchtitan/178947: Update on "add API to check if a tensor is symm-mem-tensor"PyTorch ReleasesGoogle Panda Algorithm: Understanding Its Impact and How to Recover from Its ConsequencesDev.to AIComplete Guide to llm-d CNCF Sandbox — Kubernetes-Native Distributed LLM InferenceDev.to AIciflow/trunk/178016: simplify testPyTorch Releasesciflow/torchtitan/178016: simplify testPyTorch ReleasesI Built an AI Coloring Page Generator — Got 500+ Organic Visits in One DayDev.to AIHeated Rivalry: A Guide to the Best Books, Movies, Video Games, and Podcasts for Fans of the Hit SeriesDev.to AIWe're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openlyDev.to AIciflow/torchtitan/177627: UpdatePyTorch Releasesciflow/torchtitan/177621: UpdatePyTorch Releasestrunk/d52b2f548aa3cfcfcd499fcba764fccf29628de6: [inductor] Enable precompiled headers in fbcode (#178870) (#178870)PyTorch Releases

Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28385v1 Announce Type: cross Abstract: Maritime surveillance missions, such as search and rescue and environmental monitoring, rely on the efficient allocation of sensing assets over vast and geometrically complex areas. Traditional Coverage Path Planning (CPP) approaches depend on decomposition techniques that struggle with irregular coastlines, islands, and exclusion zones, or require computationally expensive re-planning for every instance. We propose a Deep Reinforcement Learning (DRL) framework to solve CPP on hexagonal grid representations of irregular maritime areas. Unlike c — Carlos S. Sep\'ulveda, Gonzalo A. Ruz

View PDF HTML (experimental)

Abstract:Maritime surveillance missions, such as search and rescue and environmental monitoring, rely on the efficient allocation of sensing assets over vast and geometrically complex areas. Traditional Coverage Path Planning (CPP) approaches depend on decomposition techniques that struggle with irregular coastlines, islands, and exclusion zones, or require computationally expensive re-planning for every instance. We propose a Deep Reinforcement Learning (DRL) framework to solve CPP on hexagonal grid representations of irregular maritime areas. Unlike conventional methods, we formulate the problem as a neural combinatorial optimization task where a Transformer-based pointer policy autoregressively constructs coverage tours. To overcome the instability of value estimation in long-horizon routing problems, we implement a critic-free Group-Relative Policy Optimization (GRPO) scheme. This method estimates advantages through within-instance comparisons of sampled trajectories rather than relying on a value function. Experiments on 1,000 unseen synthetic maritime environments demonstrate that a trained policy achieves a 99.0% Hamiltonian success rate, more than double the best heuristic (46.0%), while producing paths 7% shorter and with 24% fewer heading changes than the closest baseline. All three inference modes (greedy, stochastic sampling, and sampling with 2-opt refinement) operate under 50~ms per instance on a laptop GPU, confirming feasibility for real-time on-board deployment.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Cite as: arXiv:2603.28385 [cs.LG]

(or arXiv:2603.28385v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28385

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Carlos Sepúlveda [view email] [v1] Mon, 30 Mar 2026 12:56:38 UTC (5,184 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Critic-Free…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 166 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers