Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles
arXiv:2604.02639v1 Announce Type: new Abstract: Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors fr
View PDF HTML (experimental)
Abstract:Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors from vision foundation model. Specifically, we introduce multi-view spatial context enrichment strategy and a cross-view surface normal constraint to improve structural coherence across spatial and temporal contexts. We further incorporate camera height regularization with ground plane-awareness to encourage metric depth estimation, together with cross-vehicle pose consistency that bridges motion estimation between articulated segments. To validate our proposed method, an articulated vehicle experiment platform was established with a dataset collected over it. Experiment results demonstrate state-of-the-art (SoTA) performance of depth estimation on our self-collected dataset as well as on DDAD, nuScenes, and KITTI benchmarks.
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2604.02639 [cs.CV]
(or arXiv:2604.02639v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2604.02639
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Weimin Liu [view email] [v1] Fri, 3 Apr 2026 02:10:11 UTC (5,922 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelfoundation modelbenchmark
RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models
Writing fast GPU code is one of the most grueling specializations in machine learning engineering. Researchers from RightNow AI want to automate it entirely. The RightNow AI research team has released AutoKernel, an open-source framework that applies an autonomous LLM agent loop to GPU kernel optimization for arbitrary PyTorch models. The approach is straightforward: give [ ] The post RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models appeared first on MarkTechPost .

Production RAG: From Anti-Patterns to Platform Engineering
RAG is a distributed system . It becomes clear when moving beyond demos into production. It consists of independent services such as ingestion, retrieval, inference, orchestration, and observability. Each component introduces its own latency, scaling characteristics, and failure modes, making coordination, observability, and fault tolerance essential. RAG flowchart In regulated environments such as banking, these systems must also satisfy strict governance, auditability, and change-control requirements aligned with standards like SOX and PCI DSS. This article builds on existing frameworks like 12 Factor Agents (Dex Horthy)¹ and Google’s 16 Factor App² by exploring key anti-patterns and introducing the pillars required to take a typical RAG pipeline to production. I’ve included code snippet

Word2Vec Explained: The Moment Words Became Relations
How models first learned meaning from context — and why that changed everything In the first post, we built the base layer: Text → Tokens → Numbers → (lots of math) → Tokens → Text In the second post, we stayed with the deeper question: Once words become numbers, how does meaning not disappear? We saw that the answer is not “because numbers are magical.” The answer is this: the numbers are learned in a space that preserves relationships. That was the real story of embeddings. Now we are ready for the next step. Because once you accept that words can become numbers without losing meaning, the next question becomes unavoidable: How are those numbers actually learned? This is where Word2Vec enters the story. And Word2Vec matters for more than historical reasons. It was not just a clever neura
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models
Writing fast GPU code is one of the most grueling specializations in machine learning engineering. Researchers from RightNow AI want to automate it entirely. The RightNow AI research team has released AutoKernel, an open-source framework that applies an autonomous LLM agent loop to GPU kernel optimization for arbitrary PyTorch models. The approach is straightforward: give [ ] The post RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models appeared first on MarkTechPost .

TurboQuant Explained: Extreme AI Compression for Faster, Cheaper LLM Inference and Vector Search
If you’ve been following the “long-context” wave in AI, you’ve probably heard the same story: bigger context windows feel magical… until… Continue reading on Towards AI »

Word2Vec Explained: The Moment Words Became Relations
How models first learned meaning from context — and why that changed everything In the first post, we built the base layer: Text → Tokens → Numbers → (lots of math) → Tokens → Text In the second post, we stayed with the deeper question: Once words become numbers, how does meaning not disappear? We saw that the answer is not “because numbers are magical.” The answer is this: the numbers are learned in a space that preserves relationships. That was the real story of embeddings. Now we are ready for the next step. Because once you accept that words can become numbers without losing meaning, the next question becomes unavoidable: How are those numbers actually learned? This is where Word2Vec enters the story. And Word2Vec matters for more than historical reasons. It was not just a clever neura


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!