Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI models will deceive you to save their own kindThe Register AI/MLArtificial Scarcity, Meet Artificial Intelligence - Health API GuyGoogle News: AIShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI Topb8640llama.cpp ReleasesTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanUniversity of Chicago's "self-driving" lab automates experiments in quantum computing research - CBS NewsGoogle News: AIGoogle launches Gemma 4, a new open-source model: How to try it - MashableGoogle News: GeminiMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI models will deceive you to save their own kindThe Register AI/MLArtificial Scarcity, Meet Artificial Intelligence - Health API GuyGoogle News: AIShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI Topb8640llama.cpp ReleasesTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanUniversity of Chicago's "self-driving" lab automates experiments in quantum computing research - CBS NewsGoogle News: AIGoogle launches Gemma 4, a new open-source model: How to try it - MashableGoogle News: GeminiMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

M2H-MX: Multi-Task Dense Visual Perception for Real-Time Monocular Spatial Understanding

arXiv cs.CVby U. V. B. L. Udugama, George Vosselman, Francesco NexApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29236v1 Announce Type: new Abstract: Monocular cameras are attractive for robotic perception due to their low cost and ease of deployment, yet achieving reliable real-time spatial understanding from a single image stream remains challenging. While recent multi-task dense prediction models have improved per-pixel depth and semantic estimation, translating these advances into stable monocular mapping systems is still non-trivial. This paper presents M2H-MX, a real-time multi-task perception model for monocular spatial understanding. The model preserves multi-scale feature representations while introducing register-gated global context and controlled cross-task interaction in a lightweight decoder, enabling depth and semantic predictions to reinforce each other under strict latency

View PDF HTML (experimental)

Abstract:Monocular cameras are attractive for robotic perception due to their low cost and ease of deployment, yet achieving reliable real-time spatial understanding from a single image stream remains challenging. While recent multi-task dense prediction models have improved per-pixel depth and semantic estimation, translating these advances into stable monocular mapping systems is still non-trivial. This paper presents M2H-MX, a real-time multi-task perception model for monocular spatial understanding. The model preserves multi-scale feature representations while introducing register-gated global context and controlled cross-task interaction in a lightweight decoder, enabling depth and semantic predictions to reinforce each other under strict latency constraints. Its outputs integrate directly into an unmodified monocular SLAM pipeline through a compact perception-to-mapping interface. We evaluate both dense prediction accuracy and in-the-loop system performance. On NYUDv2, M2H-MX-L achieves state-of-the-art results, improving semantic mIoU by 6.6% and reducing depth RMSE by 9.4% over representative multi-task baselines. When deployed in a real-time monocular mapping system on ScanNet, M2H-MX reduces average trajectory error by 60.7% compared to a strong monocular SLAM baseline while producing cleaner metric-semantic maps. These results demonstrate that modern multi-task dense prediction can be reliably deployed for real-time monocular spatial perception in robotic systems.

Comments: 6 pages, 5 figures, 5 tables. Preprint under review

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.29236 [cs.CV]

(or arXiv:2603.29236v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29236

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Bavantha Lakshan Udugama Udugama Vithanage [view email] [v1] Tue, 31 Mar 2026 04:07:42 UTC (1,875 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
M2H-MX: Mul…modelannouncefeaturepredictionglobalinterfacearXiv cs.CV

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 139 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products