Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHow SPACElab Has Integrated Science and Family Legacy to Craft Functional BeveragesInternational Business TimesDo You Trust Me? A Framework For Making Networks of Robots and Vehicles Safer - Harvard School of Engineering and Applied SciencesGoogle News: Machine LearningKubeCon Europe 2026: The Not-So-Unseen Engine Behind AI Innovation?Forrester AI Blog2. Mastering Time Series Forecasting with Python and timesfmDEV CommunityAirPods Max 2 reviewed: premium sound, top-tier ANC, same high priceTechSpotn8n Docker Setup: Why It Breaks (And the Easier Alternative)DEV Community1. Orchestrating AI Teams: A Python Guide to ChatDevDEV CommunityAI companies charge you 60% more based on your language, BPE tokensHacker NewsHow I Reverse-Engineered Claude Code's Hidden Pet SystemDEV Community@craft-ng: Associer l’art de la composition & du state management dans AngularDEV Community🔬 3D Science Lab — Interactive 3D STEM Education with 40+ Experiments Built Using Next.js and Three.jsDEV CommunityI Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible ModelsDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHow SPACElab Has Integrated Science and Family Legacy to Craft Functional BeveragesInternational Business TimesDo You Trust Me? A Framework For Making Networks of Robots and Vehicles Safer - Harvard School of Engineering and Applied SciencesGoogle News: Machine LearningKubeCon Europe 2026: The Not-So-Unseen Engine Behind AI Innovation?Forrester AI Blog2. Mastering Time Series Forecasting with Python and timesfmDEV CommunityAirPods Max 2 reviewed: premium sound, top-tier ANC, same high priceTechSpotn8n Docker Setup: Why It Breaks (And the Easier Alternative)DEV Community1. Orchestrating AI Teams: A Python Guide to ChatDevDEV CommunityAI companies charge you 60% more based on your language, BPE tokensHacker NewsHow I Reverse-Engineered Claude Code's Hidden Pet SystemDEV Community@craft-ng: Associer l’art de la composition & du state management dans AngularDEV Community🔬 3D Science Lab — Interactive 3D STEM Education with 40+ Experiments Built Using Next.js and Three.jsDEV CommunityI Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible ModelsDEV Community

BINO: Encoder Centric Self Supervised Stereo With Native Pair Input

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.27904v1 Announce Type: new Abstract: Stereo needs features that preserve fine cross view correspondence rather than only semantic similarity. Recent self supervised vision models transfer well, but they are not built for this goal, and geometry focused methods often rely on a binocular decoder or another explicit linkage module during pretraining. BINO asks whether strong binocular structure can instead be learned inside a compact encoder. It does this by fusing the rectified pair at the input stage, forming stereo micro cell tokens, and using a row aware patch phase positional enco — Haokun Zhou

View PDF HTML (experimental)

Abstract:Stereo needs features that preserve fine cross view correspondence rather than only semantic similarity. Recent self supervised vision models transfer well, but they are not built for this goal, and geometry focused methods often rely on a binocular decoder or another explicit linkage module during pretraining. BINO asks whether strong binocular structure can instead be learned inside a compact encoder. It does this by fusing the rectified pair at the input stage, forming stereo micro cell tokens, and using a row aware patch phase positional encoding. Training uses one view masked token only distillation together with occlusion and view specific appearance mismatch. In a strict low resource setting with pretraining only on KITTI object, BINO gives the best frozen descriptor results under a no linkage probe among all compared baselines on proxy dense stereo, hard negative retrieval, and KITTI Stereo2012 disparity. With the same lightweight stereo head for every encoder, it stays near CroCov2 while using a much smaller encoder. Supplementary transfer experiments on KITTI Stereo~2015 show the same qualitative trend. These results suggest that much of the cross view reasoning often assigned to a separate linkage module can be learned inside a compact and reusable encoder.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27904 [cs.CV]

(or arXiv:2603.27904v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27904

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Haokun Zhou [view email] [v1] Sun, 29 Mar 2026 23:26:09 UTC (582 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
BINO: Encod…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 213 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers