Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI Impact: Focus on Clarity, Results & Sophistication - FTI ConsultingGoogle News: AIClaude Code Source Code "Rebranded" Amid Wild Web Cloning, Anthropic's Blocking Attempt Fails - 36 KrGoogle News: ClaudeHackers slipped a trojan into the code library behind most of the internet. Your team is probably affectedVentureBeat AIThe Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History?DEV CommunityAnthropic Accidentally Releases Source Code for Claude AI AgentBloomberg TechnologyAI 週報:2026/3/27–4/1 Anthropic 一週三震、Arm 首顆自研晶片、Oracle 裁三萬人押注 AIDEV CommunityHow The US, Israel And Iran Are Using AI-Led Tactics In Battle; What It Means For The Future Of Conflicts - News18GNews AI USAAI-driven mobility report positions UAE as test-bed for next-gen travel services - VisaHQGNews AI UAETutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMy son pleasured himself on Gemini Live. Entire family's Google accounts bannedHacker News TopMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI Impact: Focus on Clarity, Results & Sophistication - FTI ConsultingGoogle News: AIClaude Code Source Code "Rebranded" Amid Wild Web Cloning, Anthropic's Blocking Attempt Fails - 36 KrGoogle News: ClaudeHackers slipped a trojan into the code library behind most of the internet. Your team is probably affectedVentureBeat AIThe Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History?DEV CommunityAnthropic Accidentally Releases Source Code for Claude AI AgentBloomberg TechnologyAI 週報:2026/3/27–4/1 Anthropic 一週三震、Arm 首顆自研晶片、Oracle 裁三萬人押注 AIDEV CommunityHow The US, Israel And Iran Are Using AI-Led Tactics In Battle; What It Means For The Future Of Conflicts - News18GNews AI USAAI-driven mobility report positions UAE as test-bed for next-gen travel services - VisaHQGNews AI UAETutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMy son pleasured himself on Gemini Live. Entire family's Google accounts bannedHacker News TopMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AI

FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2601.08026v4 Announce Type: replace-cross Abstract: Scientific compound figures combine multiple labeled panels into a single image. However, in a PMC-scale crawl of 346,567 compound figures, 16.3% have no caption and 1.8% only have captions shorter than ten words, causing them to be discarded by existing caption-decomposition pipelines. We propose FigEx2, a visual-conditioned framework that localizes panels and generates panel-wise captions directly from the image, converting otherwise unusable figures into aligned panel-text pairs for downstream pretraining and retrieval. To mitigate l — Jifeng Song, Arun Das, Pan Wang, Hui Ji, Kun Zhao, Yufei Huang

View PDF HTML (experimental)

Abstract:Scientific compound figures combine multiple labeled panels into a single image. However, in a PMC-scale crawl of 346,567 compound figures, 16.3% have no caption and 1.8% only have captions shorter than ten words, causing them to be discarded by existing caption-decomposition pipelines. We propose FigEx2, a visual-conditioned framework that localizes panels and generates panel-wise captions directly from the image, converting otherwise unusable figures into aligned panel-text pairs for downstream pretraining and retrieval. To mitigate linguistic variance in open-ended captioning, we introduce a noise-aware gated fusion module that adaptively controls how caption features condition the detection query space, and employ a staged SFT+RL strategy with CLIP-based alignment and BERTScore-based semantic rewards. To support high-quality supervision, we curate BioSci-Fig-Cap, a refined benchmark for panel-level grounding, alongside cross-disciplinary test suites in physics and chemistry. FigEx2 achieves 0.728 [email protected]:0.95 for detection, outperforms Qwen3-VL-8B by 0.44 in METEOR and 0.22 in BERTScore, and transfers zero-shot to out-of-distribution scientific domains without fine-tuning.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2601.08026 [cs.CV]

(or arXiv:2601.08026v4 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2601.08026

arXiv-issued DOI via DataCite

Submission history

From: Jifeng Song [view email] [v1] Mon, 12 Jan 2026 21:57:52 UTC (4,239 KB) [v2] Wed, 14 Jan 2026 15:49:01 UTC (4,238 KB) [v3] Wed, 25 Feb 2026 13:52:52 UTC (4,238 KB) [v4] Mon, 30 Mar 2026 15:19:06 UTC (4,237 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FigEx2: Vis…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 157 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers