Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI shutdown controls may not work as expected, new study suggests - ComputerworldGoogle News: Generative AI27 questions to ask when choosing an LLM - InfoWorldGoogle News: LLMJapan, driven by labor shortages, is increasingly adopting robotics and physical AI, with a hybrid model where startups innovate and corporations provide scale (Kate Park/TechCrunch)TechmemeAnthropic tells OpenClaw users to pay up - The Rundown AIGoogle News: ClaudeANALYSIS: Q1 IPOs ‘Forge’ Ahead as OpenAI, SpaceX Look to Debuts - Bloomberg Law NewsGoogle News: OpenAINew track in artificial intelligence added to Arkansas Tech University curriculum - River Valley Democrat-GazetteGoogle News: AIDeepMind Calls for New Safeguards Against AI Agent Exploitation - The420.inGoogle News: DeepMindChatGPT web service hit by brief disruption, OpenAI investigates - news.cgtn.comGoogle News: ChatGPTAgile Robots and Google DeepMind Partner on AI-Driven Industrial Robotics - ARC AdvisoryGoogle News: DeepMind40 Days of Building HarshAI: What I Learned About AI AutomationDEV CommunityMoving fast with agents without losing comprehensionDEV CommunityCharlie's Chocolate Factory Paperclip — Ep.1DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI shutdown controls may not work as expected, new study suggests - ComputerworldGoogle News: Generative AI27 questions to ask when choosing an LLM - InfoWorldGoogle News: LLMJapan, driven by labor shortages, is increasingly adopting robotics and physical AI, with a hybrid model where startups innovate and corporations provide scale (Kate Park/TechCrunch)TechmemeAnthropic tells OpenClaw users to pay up - The Rundown AIGoogle News: ClaudeANALYSIS: Q1 IPOs ‘Forge’ Ahead as OpenAI, SpaceX Look to Debuts - Bloomberg Law NewsGoogle News: OpenAINew track in artificial intelligence added to Arkansas Tech University curriculum - River Valley Democrat-GazetteGoogle News: AIDeepMind Calls for New Safeguards Against AI Agent Exploitation - The420.inGoogle News: DeepMindChatGPT web service hit by brief disruption, OpenAI investigates - news.cgtn.comGoogle News: ChatGPTAgile Robots and Google DeepMind Partner on AI-Driven Industrial Robotics - ARC AdvisoryGoogle News: DeepMind40 Days of Building HarshAI: What I Learned About AI AutomationDEV CommunityMoving fast with agents without losing comprehensionDEV CommunityCharlie's Chocolate Factory Paperclip — Ep.1DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2602.21655v2 Announce Type: replace-cross Abstract: Image captioning remains a fundamental task for vision language understanding, yet ground-truth supervision still relies predominantly on human-annotated references. Because human annotations reflect subjective preferences and expertise, ground-truth captions are often incomplete or even incorrect, which in turn limits caption models. We argue that caption quality should be assessed by two objective aspects: completeness (does the caption cover all salient visual facts?) and correctness (are the descriptions true with respect to the ima — Zhijiang Tang, Linhua Wang, Jiaxin Qi, Weihao Jiang, Peng Hou, Anxiang Zeng, Jianqiang Huang

View PDF HTML (experimental)

Abstract:Image captioning remains a fundamental task for vision language understanding, yet ground-truth supervision still relies predominantly on human-annotated references. Because human annotations reflect subjective preferences and expertise, ground-truth captions are often incomplete or even incorrect, which in turn limits caption models. We argue that caption quality should be assessed by two objective aspects: completeness (does the caption cover all salient visual facts?) and correctness (are the descriptions true with respect to the image?). To this end, we introduce CCCaption: a dual-reward reinforcement learning framework with a dedicated fine-tuning corpus that explicitly optimizes these properties to generate \textbf{C}omplete and \textbf{C}orrect \textbf{Captions}. For completeness, we use diverse LVLMs to disentangle the image into a set of visual queries, and reward captions that answer more of these queries, with a dynamic query sampling strategy to improve training efficiency. For correctness, we penalize captions that contain hallucinations by validating the authenticity of sub-caption queries, which are derived from the caption decomposition. Our symmetric dual-reward optimization jointly maximizes completeness and correctness, guiding models toward captions that better satisfy these objective criteria. Extensive experiments across standard captioning benchmarks show consistent improvements, offering a principled path to training caption models beyond human-annotation imitation.

Comments: Accept by CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2602.21655 [cs.CV]

(or arXiv:2602.21655v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.21655

arXiv-issued DOI via DataCite

Submission history

From: Zhijiang Tang [view email] [v1] Wed, 25 Feb 2026 07:34:26 UTC (759 KB) [v2] Sat, 28 Mar 2026 14:51:55 UTC (759 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
CCCaption: …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 244 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers