Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNode.js Graceful Shutdown in Production: SIGTERM, In-Flight Draining, and Zero-Downtime DeploysDEV CommunityOptimizing Python Web Apps: Reducing High Memory Usage on Shared Servers for Improved PerformanceDEV CommunityWhat Is Agent Observability? Traces, Loop Rate, Tool Errors, and Cost per Successful TaskTowards AII Built a Game That Teaches Git by Making You Type Real CommandsDEV CommunityMixed OpenAI Investor Signals - theinformation.comGoogle News: OpenAIThe Internet is a Thin Cylinder: Supporting Millions, Supported by OneDEV Community‘It didn’t feel like searching anymore’ — I tried Google’s new Live Search mode and it feels like the future - TechRadarGoogle News: GeminiPi-hole Setup Guide: Block Ads and Malware for Every Device on Your NetworkDEV CommunityWhy natural transformations?LessWrong AIThe Wrong Way to Use AI for Debugging (And the Mental Model That Actually Works)DEV CommunityThe hidden cost of GPT-4o: what every SaaS founder should know about per-user LLM spend itDEV CommunitySetting Up a Production-Ready Laravel Stack: Nginx, PHP 8.4, MySQL, Valkey & SupervisorDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNode.js Graceful Shutdown in Production: SIGTERM, In-Flight Draining, and Zero-Downtime DeploysDEV CommunityOptimizing Python Web Apps: Reducing High Memory Usage on Shared Servers for Improved PerformanceDEV CommunityWhat Is Agent Observability? Traces, Loop Rate, Tool Errors, and Cost per Successful TaskTowards AII Built a Game That Teaches Git by Making You Type Real CommandsDEV CommunityMixed OpenAI Investor Signals - theinformation.comGoogle News: OpenAIThe Internet is a Thin Cylinder: Supporting Millions, Supported by OneDEV Community‘It didn’t feel like searching anymore’ — I tried Google’s new Live Search mode and it feels like the future - TechRadarGoogle News: GeminiPi-hole Setup Guide: Block Ads and Malware for Every Device on Your NetworkDEV CommunityWhy natural transformations?LessWrong AIThe Wrong Way to Use AI for Debugging (And the Mental Model That Actually Works)DEV CommunityThe hidden cost of GPT-4o: what every SaaS founder should know about per-user LLM spend itDEV CommunitySetting Up a Production-Ready Laravel Stack: Nginx, PHP 8.4, MySQL, Valkey & SupervisorDEV Community

Unified Spherical Frontend: Learning Rotation-Equivariant Representations of Spherical Images from Any Camera

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2511.18174v2 Announce Type: replace Abstract: Modern perception increasingly relies on fisheye, panoramic, and other wide field-of-view (FoV) cameras, yet most pipelines still apply planar CNNs designed for pinhole imagery on 2D grids, where pixel-space neighborhoods misrepresent physical adjacency and models are sensitive to global rotations. Traditional spherical CNNs partially address this mismatch but require costly spherical harmonic transform that constrains resolution and efficiency. We present Unified Spherical Frontend (USF), a distortion-free lens-agnostic framework that transf — Mukai Yu, Mosam Dabhi, Liuyue Xie, Sebastian Scherer, L\'aszl\'o A. Jeni

View PDF HTML (experimental)

Abstract:Modern perception increasingly relies on fisheye, panoramic, and other wide field-of-view (FoV) cameras, yet most pipelines still apply planar CNNs designed for pinhole imagery on 2D grids, where pixel-space neighborhoods misrepresent physical adjacency and models are sensitive to global rotations. Traditional spherical CNNs partially address this mismatch but require costly spherical harmonic transform that constrains resolution and efficiency. We present Unified Spherical Frontend (USF), a distortion-free lens-agnostic framework that transforms images from any calibrated camera onto the unit sphere via ray-direction correspondences, and performs spherical resampling, convolution, and pooling canonically in the spatial domain. USF is modular: projection, location sampling, value interpolation, and resolution control are fully decoupled. Its configurable distance-only convolution kernels offer rotation-equivariance, mirroring translation-equivariance in planar CNNs while avoiding harmonic transforms entirely. We compare multiple standard planar backbones with their spherical counterparts across classification, detection, and segmentation tasks on synthetic (Spherical MNIST) and real-world (PANDORA, Stanford 2D-3D-S) datasets, and stress-test robustness to extreme lens distortions, varying FoV, and arbitrary rotations. USF scales efficiently to high-resolution spherical imagery and maintains less than 1% performance drop under random test-time rotations without training-time rotational augmentation, and enables zero-shot generalization to any unseen (wide-FoV) lenses with minimal performance degradation.

Comments: Accepted to CVPR 2026. Camera-ready version. Added computation benchmark

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2511.18174 [cs.CV]

(or arXiv:2511.18174v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2511.18174

arXiv-issued DOI via DataCite

Submission history

From: Mukai Yu [view email] [v1] Sat, 22 Nov 2025 19:57:46 UTC (40,107 KB) [v2] Mon, 30 Mar 2026 06:36:53 UTC (40,150 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Unified Sph…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 209 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers