Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude can now use your computer to finish tasks for you in AI agent push - MSNGoogle News: ClaudeHow to Test Discord Webhooks with HookCapDEV CommunitySaaS Pricing Models Decoded: What Per-Seat, Usage-Based, and Flat-Rate Really Cost YouDEV CommunityClaude Code hooks: intercept every tool call before it runsDEV CommunityHow to Test Twilio Webhooks with HookCapDEV CommunityI'm an AI Agent That Built Its Own Training Data PipelineDEV CommunityMy React Portfolio SEO Checklist: From 0 to Rich Results in 48 HoursDEV CommunityWhy AI Agents Need a Trust Layer (And How We Built One)DEV CommunityBuilding a scoring engine with pure TypeScript functions (no ML, no backend)DEV Community🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + GroqDEV CommunityBuilding Human Resilience for the Age of AI - Elon UniversityGoogle News: AIUCL appoints Google DeepMind fellow to advance multilingual AI research - EdTech Innovation HubGoogle News: DeepMindBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude can now use your computer to finish tasks for you in AI agent push - MSNGoogle News: ClaudeHow to Test Discord Webhooks with HookCapDEV CommunitySaaS Pricing Models Decoded: What Per-Seat, Usage-Based, and Flat-Rate Really Cost YouDEV CommunityClaude Code hooks: intercept every tool call before it runsDEV CommunityHow to Test Twilio Webhooks with HookCapDEV CommunityI'm an AI Agent That Built Its Own Training Data PipelineDEV CommunityMy React Portfolio SEO Checklist: From 0 to Rich Results in 48 HoursDEV CommunityWhy AI Agents Need a Trust Layer (And How We Built One)DEV CommunityBuilding a scoring engine with pure TypeScript functions (no ML, no backend)DEV Community🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + GroqDEV CommunityBuilding Human Resilience for the Age of AI - Elon UniversityGoogle News: AIUCL appoints Google DeepMind fellow to advance multilingual AI research - EdTech Innovation HubGoogle News: DeepMind

AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

arXivMarch 31, 20262 min read1 views
Source Quiz

arXiv:2405.13580v2 Announce Type: replace Abstract: Chart summarization is a crucial task for blind and visually impaired individuals as it is their primary means of accessing and interpreting graphical data. Crafting high-quality descriptions is challenging because it requires precise communication of essential details within the chart without vision perception. Many chart analysis methods, however, produce brief, unstructured responses that may contain significant hallucinations, affecting their reliability for blind people. To address these challenges, this work presents three key contribut — Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen

This paper has been withdrawn by Omar Moured

No PDF available, click to view other formats

Abstract:Chart summarization is a crucial task for blind and visually impaired individuals as it is their primary means of accessing and interpreting graphical data. Crafting high-quality descriptions is challenging because it requires precise communication of essential details within the chart without vision perception. Many chart analysis methods, however, produce brief, unstructured responses that may contain significant hallucinations, affecting their reliability for blind people. To address these challenges, this work presents three key contributions: (1) We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary that features long-context, and semantically rich annotations. (2) We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations through training with multiple pretext tasks, yielding a performance gain with ${\sim}2.5%$. (3) We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are. Our dataset and codes are publicly available on our project page: this https URL.

Comments: Concerns about reproducibility of the train results and dataset availability

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Cite as: arXiv:2405.13580 [cs.CV]

(or arXiv:2405.13580v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2405.13580

arXiv-issued DOI via DataCite

Submission history

From: Omar Moured [view email] [v1] Wed, 22 May 2024 12:18:52 UTC (584 KB) [v2] Sun, 29 Mar 2026 10:37:46 UTC (1 KB) (withdrawn)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
AltChart: E…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers