Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingFrom Flyers to Front Desks: How AI Is Quietly Changing Estero BusinessesMedium AIAccelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight - NVIDIA DeveloperGNews AI NVIDIA[D] On-Device Real-Time Visibility Restoration: Deterministic CV vs. Quantized ML Models. Looking for insights on Edge Preservation vs. Latency.Reddit r/MachineLearningWill the Iran War Evaporate the Gulf’s AI Oasis? - Foreign PolicyGNews AI USAIntegrando IA generativa con Bases de Datos relacionales en AWSDEV CommunityTSMC Japan 3nm Approval And Nvidia AI Demand Versus Current Valuation - Yahoo Finance SingaporeGNews AI NVIDIAThe National Policy Framework on Artificial Intelligence: Implications for Employers Using AI - JD SupraGNews AI USA5 Best Test Management Tools in 2026 — Features, Pricing & Honest ComparisonDEV CommunityAdvanced Compact Patterns for Web3 DevelopersDEV CommunityThe AI That Actually Builds Unreal Engine BlueprintsDEV CommunityThe Open-Source Alternative to Oracle 26ai: Why PostgreSQL is All You NeedDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingFrom Flyers to Front Desks: How AI Is Quietly Changing Estero BusinessesMedium AIAccelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight - NVIDIA DeveloperGNews AI NVIDIA[D] On-Device Real-Time Visibility Restoration: Deterministic CV vs. Quantized ML Models. Looking for insights on Edge Preservation vs. Latency.Reddit r/MachineLearningWill the Iran War Evaporate the Gulf’s AI Oasis? - Foreign PolicyGNews AI USAIntegrando IA generativa con Bases de Datos relacionales en AWSDEV CommunityTSMC Japan 3nm Approval And Nvidia AI Demand Versus Current Valuation - Yahoo Finance SingaporeGNews AI NVIDIAThe National Policy Framework on Artificial Intelligence: Implications for Employers Using AI - JD SupraGNews AI USA5 Best Test Management Tools in 2026 — Features, Pricing & Honest ComparisonDEV CommunityAdvanced Compact Patterns for Web3 DevelopersDEV CommunityThe AI That Actually Builds Unreal Engine BlueprintsDEV CommunityThe Open-Source Alternative to Oracle 26ai: Why PostgreSQL is All You NeedDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.09326v2 Announce Type: replace Abstract: Multimodal large language models (MLLMs) have achieved remarkable performance across a wide range of vision language tasks. However, their ability in low-level visual perception, particularly in detecting fine-grained visual discrepancies, remains underexplored and lacks systematic analysis. In this work, we introduce OddGridBench, a controllable benchmark for evaluating the visual discrepancy sensitivity of MLLMs. OddGridBench comprises over 1,400 grid-based images, where a single element differs from all others by one or multiple visual att — Tengjin Weng, Wenhao Jiang, Jingyi Wang, Ming Li, Lin Ma, Zhong Ming

View PDF HTML (experimental)

Abstract:Multimodal large language models (MLLMs) have achieved remarkable performance across a wide range of vision language tasks. However, their ability in low-level visual perception, particularly in detecting fine-grained visual discrepancies, remains underexplored and lacks systematic analysis. In this work, we introduce OddGridBench, a controllable benchmark for evaluating the visual discrepancy sensitivity of MLLMs. OddGridBench comprises over 1,400 grid-based images, where a single element differs from all others by one or multiple visual attributes such as color, size, rotation, or position. Experiments reveal that all evaluated MLLMs, including open-source families such as Qwen3-VL and InternVL3.5, and proprietary systems like Gemini-2.5-Pro and GPT-5, perform far below human levels in visual discrepancy detection. We further propose OddGrid-GRPO, a reinforcement learning framework that integrates curriculum learning and distance-aware reward. By progressively controlling the difficulty of training samples and incorporating spatial proximity constraints into the reward design, OddGrid-GRPO significantly enhances the model's fine-grained visual discrimination ability. We hope OddGridBench and OddGrid-GRPO will lay the groundwork for advancing perceptual grounding and visual discrepancy sensitivity in multimodal intelligence. Code and dataset are available at this https URL.

Comments: accepted by CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.09326 [cs.CV]

(or arXiv:2603.09326v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.09326

arXiv-issued DOI via DataCite

Submission history

From: Tengjin Weng [view email] [v1] Tue, 10 Mar 2026 08:01:30 UTC (7,152 KB) [v2] Mon, 30 Mar 2026 12:07:08 UTC (7,147 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
OddGridBenc…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 175 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!