Research Papers research paper arxiv nlp language-models

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

arXivMarch 30, 202610 min read0 views

arXiv:2603.25804v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \textbf{\texttt{RealChart2Code}}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and as — Jiajun Zhang, Yuying Li, Zhixun Li, Xingyu Guo, Jingzhuo Wu, Leqi Zheng, Yiran Yang, Jianke Zhang, Qingbin Li, Shannan Yan, Zhetong Li, Changguo Jia, Junfei Wu, Zilei Wang, Qiang Liu, Liang Wang

Authors:Jiajun Zhang, Yuying Li, Zhixun Li, Xingyu Guo, Jingzhuo Wu, Leqi Zheng, Yiran Yang, Jianke Zhang, Qingbin Li, Shannan Yan, Zhetong Li, Changguo Jia, Junfei Wu, Zilei Wang, Qiang Liu, Liang Wang

View PDF HTML (experimental)

Abstract:Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \textbf{\texttt{RealChart2Code}}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on \texttt{RealChart2Code} reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions. We release the benchmark and code at \url{this https URL}.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25804 [cs.CL]

(or arXiv:2603.25804v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25804

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jiajun Zhang [view email] [v1] Thu, 26 Mar 2026 18:11:46 UTC (33,575 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25804

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

AI is quick but risky for updating old software, researchers warn - Tech Xplore

AI is quick but risky for updating old software, researchers warn Tech Xplore

GNews AI coding

1m3 months ago

Products

Trailer: The Shape of Things to Come

Microsoft research lead Doug Burger introduces his new podcast series, "The Shape of Things to Come", an exploration into the fundamental truths about AI and how the technology will reshape the future. The post Trailer: The Shape of Things to Come appeared first on Microsoft Research .

Microsoft Research Blog

2mabout 1 month ago

Models

Will machines ever be intelligent?

Are machines truly intelligent? AI researchers Subutai Ahmad and Nicolò Fusi join Doug Burger to compare transformer-based AI with the human brain, exploring continual learning, efficiency, and whether today’s models are on a path toward human intelligence. The post Will machines ever be intelligent? appeared first on Microsoft Research .

Microsoft Research Blog

1m11 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 111 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Submission history

Daily AI Digest

More about

AI is quick but risky for updating old software, researchers warn - Tech Xplore

Trailer: The Shape of Things to Come

Will machines ever be intelligent?

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

AI is quick but risky for updating old software, researchers warn - Tech Xplore

When the server crashes the soul

Automatic Textbook Formalization

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ