Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessExplained: The Source Code Leak that hit AI Giant Anthropic - Cyber MagazineGoogle News: ClaudeDespite Skepticism, Survey Shows Widespread AI Use at Cal State - Inside Higher EdGoogle News: ChatGPTBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agent Did Something It Wasn't Supposed To. Now What?Dev.to AITrust drives Korea’s generative AI adoption; usability and interaction sustain use - CHOSUNBIZ - ChosunbizGoogle News: Generative AIThe Model You Love Is Probably Just the One You UseO'Reilly Radar3 of Your AI Agents Crashed and You Found Out From CustomersDev.to AIYour AI Agent Is Running Wild and You Can't Stop ItDev.to AIYour AI Agent Spent $500 Overnight and Nobody NoticedDEV CommunityWhy Software Project Estimates Are Always Wrong (And How to Fix It)DEV CommunityChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion - Tom's GuideGoogle News: ChatGPTHow to Build a Responsible AI Framework for Transparent, Ethical, and Secure AppsDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessExplained: The Source Code Leak that hit AI Giant Anthropic - Cyber MagazineGoogle News: ClaudeDespite Skepticism, Survey Shows Widespread AI Use at Cal State - Inside Higher EdGoogle News: ChatGPTBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agent Did Something It Wasn't Supposed To. Now What?Dev.to AITrust drives Korea’s generative AI adoption; usability and interaction sustain use - CHOSUNBIZ - ChosunbizGoogle News: Generative AIThe Model You Love Is Probably Just the One You UseO'Reilly Radar3 of Your AI Agents Crashed and You Found Out From CustomersDev.to AIYour AI Agent Is Running Wild and You Can't Stop ItDev.to AIYour AI Agent Spent $500 Overnight and Nobody NoticedDEV CommunityWhy Software Project Estimates Are Always Wrong (And How to Fix It)DEV CommunityChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion - Tom's GuideGoogle News: ChatGPTHow to Build a Responsible AI Framework for Transparent, Ethical, and Secure AppsDev.to AI

Binary Verification for Zero-Shot Vision

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2511.10983v2 Announce Type: replace-cross Abstract: We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC — Rongbin Hu, Jeffrey Liu

View PDF HTML (experimental)

Abstract:We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. We further integrate the proposed REC workflow into a real-world video processing and editing system, and present the system architecture and end-to-end pipeline in the paper. Together, these components yield a simple and unified workflow that emphasizes inference-time design over task-specific training. It offers a practical, drop-in path to stronger zero-shot vision with today's VLMs.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2511.10983 [cs.CV]

(or arXiv:2511.10983v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2511.10983

arXiv-issued DOI via DataCite

Submission history

From: Jeffrey Liu [view email] [v1] Fri, 14 Nov 2025 06:05:43 UTC (1,872 KB) [v2] Fri, 27 Mar 2026 04:08:31 UTC (3,934 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Binary Veri…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers