Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeDoes GPT-2 Have a Fear Direction?lesswrong.comY Combinator's CEO says he ships 37,000 lines of AI code per dayHacker News AI TopShow HN: SpeechSDK – free, open-source SDK that unifies all AI voice modelsHacker News AI TopWe Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.Medium AIAMD vs. Nvidia: The AI Supercycle Is Big Enough for Both. Here's the Better Buy. - AOL.comGNews AI NVIDIAFiling: Anthropic has formed AnthroPAC, a new PAC that will be funded exclusively and voluntarily by its employees and is expected to be bipartisan (Miranda Nazzaro/The Hill)TechmemeI Broke Up With ChatGPT (And My Productivity Thanked Me)Medium AIAI startup envisions '100M new people' making videogamesHacker News AI TopEsquire Singapore's One Piece "interview" mashes up AI slop and ghoulishness to make ghoulislop - AV ClubGNews AI SingaporeMost Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…Medium AIWhen the server crashes the soulMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeDoes GPT-2 Have a Fear Direction?lesswrong.comY Combinator's CEO says he ships 37,000 lines of AI code per dayHacker News AI TopShow HN: SpeechSDK – free, open-source SDK that unifies all AI voice modelsHacker News AI TopWe Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.Medium AIAMD vs. Nvidia: The AI Supercycle Is Big Enough for Both. Here's the Better Buy. - AOL.comGNews AI NVIDIAFiling: Anthropic has formed AnthroPAC, a new PAC that will be funded exclusively and voluntarily by its employees and is expected to be bipartisan (Miranda Nazzaro/The Hill)TechmemeI Broke Up With ChatGPT (And My Productivity Thanked Me)Medium AIAI startup envisions '100M new people' making videogamesHacker News AI TopEsquire Singapore's One Piece "interview" mashes up AI slop and ghoulishness to make ghoulislop - AV ClubGNews AI SingaporeMost Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…Medium AIWhen the server crashes the soulMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.25804v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \textbf{\texttt{RealChart2Code}}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and as — Jiajun Zhang, Yuying Li, Zhixun Li, Xingyu Guo, Jingzhuo Wu, Leqi Zheng, Yiran Yang, Jianke Zhang, Qingbin Li, Shannan Yan, Zhetong Li, Changguo Jia, Junfei Wu, Zilei Wang, Qiang Liu, Liang Wang

Authors:Jiajun Zhang, Yuying Li, Zhixun Li, Xingyu Guo, Jingzhuo Wu, Leqi Zheng, Yiran Yang, Jianke Zhang, Qingbin Li, Shannan Yan, Zhetong Li, Changguo Jia, Junfei Wu, Zilei Wang, Qiang Liu, Liang Wang

View PDF HTML (experimental)

Abstract:Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \textbf{\texttt{RealChart2Code}}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on \texttt{RealChart2Code} reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions. We release the benchmark and code at \url{this https URL}.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25804 [cs.CL]

(or arXiv:2603.25804v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25804

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jiajun Zhang [view email] [v1] Thu, 26 Mar 2026 18:11:46 UTC (33,575 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
RealChart2C…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 111 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!