Research Papers research paper arxiv Vision-Language Models chart generation multi-panel visualization

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

HuggingFace Papersby Jiajun Zhang ,March 26, 20262 min read2 views

🧒Explain Like I'm 5Simple language

Hey there, little explorer! 🚀

Imagine you have a super smart robot friend! This robot can look at a picture, like a drawing of how many cookies you ate each day. 🍪

Now, these smart robots are learning to draw their own pictures, like charts, from real numbers. But guess what? Sometimes it's tricky!

A new game was made to test them. It's like giving them a big puzzle with lots of numbers and asking them to draw a super fancy chart, like a big castle made of blocks. 🏰

Some robots are really good at it, like the ones from big companies. But other robots, the ones anyone can play with, find it a bit harder. They sometimes make mistakes on the tricky parts of the castle.

So, scientists are learning how to help all the robot friends draw even better pictures! Yay for smart robots! 🎉

Vision-Language Models face significant challenges in generating complex multi-panel charts from real-world data, as demonstrated by a new large-scale benchmark that reveals performance gaps between proprietary and open-weight models. (8 upvotes on HuggingFace)

Published on Mar 26

Submitted by

zjj

on Mar 30

Authors:

Abstract

AI-generated summary

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \texttt{RealChart2Code}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on RealChart2Code reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions. We release the benchmark and code at https://github.com/Speakn0w/RealChart2Code.

View arXiv page View PDF Project page GitHub 8 Add to collection

Get this paper in your agent:

hf papers read 2603.25804

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.25804 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.25804 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.25804 in a Space README.md to link it from this page.

Collections including this paper 1

Original source

HuggingFace Papers

https://huggingface.co/papers/2603.25804

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsLive

An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code

arXiv:2604.02352v1 Announce Type: new Abstract: Although LLMs are capable of generating functionally correct code, they also tend to produce less energy-efficient code in comparison to human-written solutions. As these inefficiencies lead to higher computational overhead, they are in direct conflict with Green Software Development (GSD) efforts, which aim to reduce the energy consumption of code. To support these efforts, this study aims to investigate whether and how LLMs can be optimized to promote the generation of energy-efficient code. To this end, we employ Contrastive Prompt Tuning (CPT). CPT combines Contrastive Learning techniques, which help the model to distinguish between efficient and inefficient code, and Prompt Tuning, a Parameter-Efficient Fine Tuning (PEFT) approach that r

arXiv cs.LG

1mabout 1 hour ago

ModelsLive

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

arXiv:2604.02350v1 Announce Type: new Abstract: Neural networks excel at pattern recognition but struggle with constraint reasoning -- determining whether configurations satisfy logical or physical constraints. We introduce Differentiable Symbolic Planning (DSP), a neural architecture that performs discrete symbolic reasoning while remaining fully differentiable. DSP maintains a feasibility channel (phi) that tracks constraint satisfaction evidence at each node, aggregates this into a global feasibility signal (Phi) through learned rule-weighted combination, and uses sparsemax attention to achieve exact-zero discrete rule selection. We integrate DSP into a Universal Cognitive Kernel (UCK) that combines graph attention with iterative constraint propagation. Evaluated on three constraint rea

arXiv cs.LG

1mabout 1 hour ago

ProductsLive

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

arXiv:2604.02349v1 Announce Type: new Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in various real-world applications. However, obtaining human feedback for preferences can be expensive and time-consuming, which forms a strong barrier for PbRL. In this work, we address the problem of low query efficiency in offline PbRL, pinpointing two primary reasons: inefficient exploration and overoptimization of learned reward functions. In response to these challenges, we propose a novel algorithm, \textbf{O}ffline \textbf{P}b\textbf{R}L via \textbf{I}n-\textbf{D}ataset \textbf{E}xploration (OPRIDE), designed to enhance the query efficiency of offline PbRL. OPRIDE consists of two key

arXiv cs.LG

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 275 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation

arXiv:2604.02557v1 Announce Type: new Abstract: Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce the task of culturally-adapted art description generation, where models describe artworks for audiences from different cultural groups who vary in their familiarity with the cultural symbols and narratives embedded in the artwork. To evaluate cultural competence in this pragmatic generation task, we propose a framework based on culturally grounded question answering. We find that base models are only marginally adequate for this task, but, through a pragmatic speaker model, we can improve simulated listener comprehension

arXiv cs.CL

1mabout 1 hour ago

Research PapersLive

Skeleton-based Coherence Modeling in Narratives

arXiv:2604.02451v1 Announce Type: new Abstract: Modeling coherence in text has been a task that has excited NLP researchers since a long time. It has applications in detecting incoherent structures and helping the author fix them. There has been recent work in using neural networks to extract a skeleton from one sentence, and then use that skeleton to generate the next sentence for coherent narrative story generation. In this project, we aim to study if the consistency of skeletons across subsequent sentences is a good metric to characterize the coherence of a given body of text. We propose a new Sentence/Skeleton Similarity Network (SSN) for modeling coherence across pairs of sentences, and show that this network performs much better than baseline similarity techniques like cosine similar

arXiv cs.CL

1mabout 1 hour ago

Research PapersLive

Lipschitz bounds for integral kernels

arXiv:2604.02887v1 Announce Type: new Abstract: Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and

arXiv stat.ML

2mabout 1 hour ago

Research PapersLive

State estimations and noise identifications with intermittent corrupted observations via Bayesian variational inference

arXiv:2604.02738v1 Announce Type: new Abstract: This paper focuses on the state estimation problem in distributed sensor networks, where intermittent packet dropouts, corrupted observations, and unknown noise covariances coexist. To tackle this challenge, we formulate the joint estimation of system states, noise parameters, and network reliability as a Bayesian variational inference problem, and propose a novel variational Bayesian adaptive Kalman filter (VB-AKF) to approximate the joint posterior probability densities of the latent parameters. Unlike existing AKF that separately handle missing data and measurement outliers, the proposed VB-AKF adopts a dual-mask generative model with two independent Bernoulli random variables, explicitly characterizing both observable communication losses

arXiv stat.ML

1mabout 1 hour ago