Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessEarly Career Award recipient Aleksandra Ćiprijanović aims to create universal AI analysis framework - Fermilab (.gov)Google News: AIExclusive: Miravoice, Builder Of An AI ‘Interviewer’ To Conduct Phone Surveys, Raises $6.3MCrunchbase NewsMoltbook risks: The dangers of AI-to-AI interactions in health carePhys.org AIMaul: Shadow Lord Will Return for Season 2GizmodoMicrosoft Aims to Create Large Cutting-Edge AI Models By 2027Bloomberg TechnologyHow Disney Imagineers are using AI and robotics to reshape the company’s theme parksFast Company TechA jury says Meta and Google hurt a kid. What now?The Verge AII have always seen myself as ‘progressive’ – but with AI it’s time to hit the brakes - The GuardianGoogle News: AIOpenAI Teams Up with Smartly to Create Chatty Ads Inside ChatGPT - TipRanksGoogle News: ChatGPTDOJ to Appeal Court Order Halting Trump’s Ban on Anthropic AIBloomberg TechnologyCapacity and speed: why TikTok shelved its second Irish data centreSilicon RepublicBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessEarly Career Award recipient Aleksandra Ćiprijanović aims to create universal AI analysis framework - Fermilab (.gov)Google News: AIExclusive: Miravoice, Builder Of An AI ‘Interviewer’ To Conduct Phone Surveys, Raises $6.3MCrunchbase NewsMoltbook risks: The dangers of AI-to-AI interactions in health carePhys.org AIMaul: Shadow Lord Will Return for Season 2GizmodoMicrosoft Aims to Create Large Cutting-Edge AI Models By 2027Bloomberg TechnologyHow Disney Imagineers are using AI and robotics to reshape the company’s theme parksFast Company TechA jury says Meta and Google hurt a kid. What now?The Verge AII have always seen myself as ‘progressive’ – but with AI it’s time to hit the brakes - The GuardianGoogle News: AIOpenAI Teams Up with Smartly to Create Chatty Ads Inside ChatGPT - TipRanksGoogle News: ChatGPTDOJ to Appeal Court Order Halting Trump’s Ban on Anthropic AIBloomberg TechnologyCapacity and speed: why TikTok shelved its second Irish data centreSilicon Republic
AI NEWS HUBbyEIGENVECTOREigenvector

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

ArXiv CS.AIby Rongtian YeApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.28902v1 Announce Type: new Abstract: Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce ChartDiff, the first large-scale benchmark for cross-chart comparative summarization. ChartDiff consists of 8,541 chart pairs spanning diverse data sources, chart types, and visual styles, each annotated with LLM-generated and human-verified summaries describing differences in trends, fluctuations, and anomalies. Using ChartDiff, we evaluate general-purpose, chart-specialized, and pipeline-based models. Our results show that frontier general-purpose models achieve the highest GPT-based quality, while specia

View PDF HTML (experimental)

Abstract:Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce ChartDiff, the first large-scale benchmark for cross-chart comparative summarization. ChartDiff consists of 8,541 chart pairs spanning diverse data sources, chart types, and visual styles, each annotated with LLM-generated and human-verified summaries describing differences in trends, fluctuations, and anomalies. Using ChartDiff, we evaluate general-purpose, chart-specialized, and pipeline-based models. Our results show that frontier general-purpose models achieve the highest GPT-based quality, while specialized and pipeline-based methods obtain higher ROUGE scores but lower human-aligned evaluation, revealing a clear mismatch between lexical overlap and actual summary quality. We further find that multi-series charts remain challenging across model families, whereas strong end-to-end models are relatively robust to differences in plotting libraries. Overall, our findings demonstrate that comparative chart reasoning remains a significant challenge for current vision-language models and position ChartDiff as a new benchmark for advancing research on multi-chart understanding.

Comments: 21 pages, 17 figures

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.28902 [cs.AI]

(or arXiv:2603.28902v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.28902

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Rongtian Ye [view email] [v1] Mon, 30 Mar 2026 18:29:02 UTC (3,295 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ChartDiff: …modellanguage mo…benchmarkannouncevaluationtrendArXiv CS.AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 217 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models