RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
Hey there, little explorer! 🚀
Imagine you have a super smart robot friend! This robot can look at a picture, like a drawing of how many cookies you ate each day. 🍪
Now, these smart robots are learning to draw their own pictures, like charts, from real numbers. But guess what? Sometimes it's tricky!
A new game was made to test them. It's like giving them a big puzzle with lots of numbers and asking them to draw a super fancy chart, like a big castle made of blocks. 🏰
Some robots are really good at it, like the ones from big companies. But other robots, the ones anyone can play with, find it a bit harder. They sometimes make mistakes on the tricky parts of the castle.
So, scientists are learning how to help all the robot friends draw even better pictures! Yay for smart robots! 🎉
Vision-Language Models face significant challenges in generating complex multi-panel charts from real-world data, as demonstrated by a new large-scale benchmark that reveals performance gaps between proprietary and open-weight models. (8 upvotes on HuggingFace)
Published on Mar 26
·
Submitted by
zjj
on Mar 30
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
Vision-Language Models face significant challenges in generating complex multi-panel charts from real-world data, as demonstrated by a new large-scale benchmark that reveals performance gaps between proprietary and open-weight models.
AI-generated summary
Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \texttt{RealChart2Code}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on RealChart2Code reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions. We release the benchmark and code at https://github.com/Speakn0w/RealChart2Code.
View arXiv page View PDF Project page GitHub 8 Add to collection
Get this paper in your agent:
hf papers read 2603.25804
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2603.25804 in a model README.md to link it from this page.
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2603.25804 in a dataset README.md to link it from this page.
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2603.25804 in a Space README.md to link it from this page.
Collections including this paper 1
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code
arXiv:2604.02352v1 Announce Type: new Abstract: Although LLMs are capable of generating functionally correct code, they also tend to produce less energy-efficient code in comparison to human-written solutions. As these inefficiencies lead to higher computational overhead, they are in direct conflict with Green Software Development (GSD) efforts, which aim to reduce the energy consumption of code. To support these efforts, this study aims to investigate whether and how LLMs can be optimized to promote the generation of energy-efficient code. To this end, we employ Contrastive Prompt Tuning (CPT). CPT combines Contrastive Learning techniques, which help the model to distinguish between efficient and inefficient code, and Prompt Tuning, a Parameter-Efficient Fine Tuning (PEFT) approach that r

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility
arXiv:2604.02350v1 Announce Type: new Abstract: Neural networks excel at pattern recognition but struggle with constraint reasoning -- determining whether configurations satisfy logical or physical constraints. We introduce Differentiable Symbolic Planning (DSP), a neural architecture that performs discrete symbolic reasoning while remaining fully differentiable. DSP maintains a feasibility channel (phi) that tracks constraint satisfaction evidence at each node, aggregates this into a global feasibility signal (Phi) through learned rule-weighted combination, and uses sparsemax attention to achieve exact-zero discrete rule selection. We integrate DSP into a Universal Cognitive Kernel (UCK) that combines graph attention with iterative constraint propagation. Evaluated on three constraint rea

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
arXiv:2604.02349v1 Announce Type: new Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in various real-world applications. However, obtaining human feedback for preferences can be expensive and time-consuming, which forms a strong barrier for PbRL. In this work, we address the problem of low query efficiency in offline PbRL, pinpointing two primary reasons: inefficient exploration and overoptimization of learned reward functions. In response to these challenges, we propose a novel algorithm, \textbf{O}ffline \textbf{P}b\textbf{R}L via \textbf{I}n-\textbf{D}ataset \textbf{E}xploration (OPRIDE), designed to enhance the query efficiency of offline PbRL. OPRIDE consists of two key
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation
arXiv:2604.02557v1 Announce Type: new Abstract: Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce the task of culturally-adapted art description generation, where models describe artworks for audiences from different cultural groups who vary in their familiarity with the cultural symbols and narratives embedded in the artwork. To evaluate cultural competence in this pragmatic generation task, we propose a framework based on culturally grounded question answering. We find that base models are only marginally adequate for this task, but, through a pragmatic speaker model, we can improve simulated listener comprehension

Skeleton-based Coherence Modeling in Narratives
arXiv:2604.02451v1 Announce Type: new Abstract: Modeling coherence in text has been a task that has excited NLP researchers since a long time. It has applications in detecting incoherent structures and helping the author fix them. There has been recent work in using neural networks to extract a skeleton from one sentence, and then use that skeleton to generate the next sentence for coherent narrative story generation. In this project, we aim to study if the consistency of skeletons across subsequent sentences is a good metric to characterize the coherence of a given body of text. We propose a new Sentence/Skeleton Similarity Network (SSN) for modeling coherence across pairs of sentences, and show that this network performs much better than baseline similarity techniques like cosine similar

Lipschitz bounds for integral kernels
arXiv:2604.02887v1 Announce Type: new Abstract: Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and

State estimations and noise identifications with intermittent corrupted observations via Bayesian variational inference
arXiv:2604.02738v1 Announce Type: new Abstract: This paper focuses on the state estimation problem in distributed sensor networks, where intermittent packet dropouts, corrupted observations, and unknown noise covariances coexist. To tackle this challenge, we formulate the joint estimation of system states, noise parameters, and network reliability as a Bayesian variational inference problem, and propose a novel variational Bayesian adaptive Kalman filter (VB-AKF) to approximate the joint posterior probability densities of the latent parameters. Unlike existing AKF that separately handle missing data and measurement outliers, the proposed VB-AKF adopts a dual-mask generative model with two independent Bernoulli random variables, explicitly characterizing both observable communication losses

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!