Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessThe AI-Powered Agency: A Developer Playbook for Selling AI Services in 2026Dev.to AIYour AI Chatbot Isn't Stupid. It Just Has No Memory. Here's How We Fixed That.Dev.to AIInternational RegLab Project reports on AI use in nuclear power plant operations - Nuclear Energy Agency (NEA)Google News: AIAI Agent Tools for Small Business Owners: A Practical GuideDev.to AIPRH Germany sues OpenAI for ‘copyright infringement’ of children’s series - The BooksellerGoogle News: OpenAIEmail obfuscation: What works in 2026?!DEV CommunityReply Signs Strategic Collaboration Agreement with AWS to Accelerate AI-Driven Cloud Transformation - Press Release HubGoogle News: Generative AIDeepSource vs Qodana: Code Quality Platforms Compared (2026)DEV CommunityThe Senior Angular Take‑Home That Made Me Rethink Tech InterviewsDEV CommunityClaude Code Leak: 16 Lessons on Building Production-Ready AI SystemsAnalytics VidhyaImage Optimisation Strategies for Better LCP ScoresDEV CommunityStop Building AI Into Your Product. Start Building Products With AI.DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessThe AI-Powered Agency: A Developer Playbook for Selling AI Services in 2026Dev.to AIYour AI Chatbot Isn't Stupid. It Just Has No Memory. Here's How We Fixed That.Dev.to AIInternational RegLab Project reports on AI use in nuclear power plant operations - Nuclear Energy Agency (NEA)Google News: AIAI Agent Tools for Small Business Owners: A Practical GuideDev.to AIPRH Germany sues OpenAI for ‘copyright infringement’ of children’s series - The BooksellerGoogle News: OpenAIEmail obfuscation: What works in 2026?!DEV CommunityReply Signs Strategic Collaboration Agreement with AWS to Accelerate AI-Driven Cloud Transformation - Press Release HubGoogle News: Generative AIDeepSource vs Qodana: Code Quality Platforms Compared (2026)DEV CommunityThe Senior Angular Take‑Home That Made Me Rethink Tech InterviewsDEV CommunityClaude Code Leak: 16 Lessons on Building Production-Ready AI SystemsAnalytics VidhyaImage Optimisation Strategies for Better LCP ScoresDEV CommunityStop Building AI Into Your Product. Start Building Products With AI.DEV Community
Eigenvector logo
AI NEWS HUBbyEIGENVECTOR

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2512.10950v2 Announce Type: replace Abstract: Self-supervised pre-training has driven rapid progress in foundation models for language, 2D images, and video, yet remains largely unexplored for learning 3D-aware representations from multi-view images. In this paper, we present E-RayZer, a self-supervised 3D vision model that learns geometrically grounded representations directly from unlabeled images. Unlike prior self-supervised methods such as RayZer, which infer 3D indirectly through latent-space view synthesis, E-RayZer operates directly in 3D space, performing self-supervised 3D reco — Qitao Zhao, Hao Tan, Qianqian Wang, Sai Bi, Kai Zhang, Kalyan Sunkavalli, Shubham Tulsiani, Hanwen Jiang

View PDF HTML (experimental)

Abstract:Self-supervised pre-training has driven rapid progress in foundation models for language, 2D images, and video, yet remains largely unexplored for learning 3D-aware representations from multi-view images. In this paper, we present E-RayZer, a self-supervised 3D vision model that learns geometrically grounded representations directly from unlabeled images. Unlike prior self-supervised methods such as RayZer, which infer 3D indirectly through latent-space view synthesis, E-RayZer operates directly in 3D space, performing self-supervised 3D reconstruction with Explicit geometry. This formulation eliminates shortcut solutions and yields representations that are 3D-aware. To ensure convergence and scalability, we introduce a fine-grained learning curriculum that organizes training from easy to hard samples and harmonizes heterogeneous data sources without any supervision. Experiments show that E-RayZer significantly outperforms RayZer on pose estimation and matches or sometimes surpasses fully supervised reconstruction models such as VGGT. Furthermore, its learned representations outperform leading visual pre-training models (e.g., DINOv3, CroCo v2, VideoMAE V2, and RayZer) on 3D downstream tasks, establishing E-RayZer as a promising paradigm for spatial visual pre-training.

Comments: CVPR 2026 Camera-ready. Project website: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2512.10950 [cs.CV]

(or arXiv:2512.10950v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.10950

arXiv-issued DOI via DataCite

Submission history

From: Qitao Zhao [view email] [v1] Thu, 11 Dec 2025 18:59:53 UTC (12,346 KB) [v2] Sat, 28 Mar 2026 00:35:24 UTC (10,850 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
E-RayZer: S…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 227 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers