Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeDoes GPT-2 Have a Fear Direction?lesswrong.comY Combinator's CEO says he ships 37,000 lines of AI code per dayHacker News AI TopShow HN: SpeechSDK – free, open-source SDK that unifies all AI voice modelsHacker News AI TopWe Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.Medium AIAMD vs. Nvidia: The AI Supercycle Is Big Enough for Both. Here's the Better Buy. - AOL.comGNews AI NVIDIAFiling: Anthropic has formed AnthroPAC, a new PAC that will be funded exclusively and voluntarily by its employees and is expected to be bipartisan (Miranda Nazzaro/The Hill)TechmemeI Broke Up With ChatGPT (And My Productivity Thanked Me)Medium AIAI startup envisions '100M new people' making videogamesHacker News AI TopEsquire Singapore's One Piece "interview" mashes up AI slop and ghoulishness to make ghoulislop - AV ClubGNews AI SingaporeMost Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…Medium AIWhen the server crashes the soulMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeDoes GPT-2 Have a Fear Direction?lesswrong.comY Combinator's CEO says he ships 37,000 lines of AI code per dayHacker News AI TopShow HN: SpeechSDK – free, open-source SDK that unifies all AI voice modelsHacker News AI TopWe Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.Medium AIAMD vs. Nvidia: The AI Supercycle Is Big Enough for Both. Here's the Better Buy. - AOL.comGNews AI NVIDIAFiling: Anthropic has formed AnthroPAC, a new PAC that will be funded exclusively and voluntarily by its employees and is expected to be bipartisan (Miranda Nazzaro/The Hill)TechmemeI Broke Up With ChatGPT (And My Productivity Thanked Me)Medium AIAI startup envisions '100M new people' making videogamesHacker News AI TopEsquire Singapore's One Piece "interview" mashes up AI slop and ghoulishness to make ghoulislop - AV ClubGNews AI SingaporeMost Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…Medium AIWhen the server crashes the soulMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

HighlightBench: Benchmarking Markup-Driven Table Reasoning in Scientific Documents

arXivby [Submitted on 25 Mar 2026]March 31, 20261 min read0 views
Source Quiz

arXiv:2603.26784v1 Announce Type: new Abstract: Visual markups such as highlights, underlines, and bold text are common in table-centric documents. Although multimodal large language models (MLLMs) have made substantial progress in document understanding, their ability to treat such cues as explicit logical directives remains under-explored. More importantly, existing evaluations cannot distinguish whether a model fails to see the markup or fails to reason with it. This creates a key blind spot in assessing markup-conditioned behavior over tables. To address this gap, we introduce HighlightBen — Lexin Wang, Shenghua Liu, Yiwei Wang, Yujun Cai, Yuyao Ge, Jiayu Yao, Jiafeng Guo, Xueqi Cheng

View PDF HTML (experimental)

Abstract:Visual markups such as highlights, underlines, and bold text are common in table-centric documents. Although multimodal large language models (MLLMs) have made substantial progress in document understanding, their ability to treat such cues as explicit logical directives remains under-explored. More importantly, existing evaluations cannot distinguish whether a model fails to see the markup or fails to reason with it. This creates a key blind spot in assessing markup-conditioned behavior over tables. To address this gap, we introduce HighlightBench, a diagnostic benchmark for markup-driven table understanding that decomposes evaluation into five task families: Markup Grounding, Constrained Retrieval, Local Relations, Aggregation & Comparison, and Consistency & Missingness. We further provide a reference pipeline that makes intermediate decisions explicit, enabling reproducible baselines and finer-grained attribution of errors along the perception-to-execution chain. Experiments show that even strong models remain unstable when visual cues must be consistently aligned with symbolic reasoning under structured output constraints.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26784 [cs.CV]

(or arXiv:2603.26784v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26784

arXiv-issued DOI via DataCite

Submission history

From: Lexin Wang [view email] [v1] Wed, 25 Mar 2026 06:15:40 UTC (6,553 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
HighlightBe…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 111 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!