CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
arXiv:2602.21655v2 Announce Type: replace-cross Abstract: Image captioning remains a fundamental task for vision language understanding, yet ground-truth supervision still relies predominantly on human-annotated references. Because human annotations reflect subjective preferences and expertise, ground-truth captions are often incomplete or even incorrect, which in turn limits caption models. We argue that caption quality should be assessed by two objective aspects: completeness (does the caption cover all salient visual facts?) and correctness (are the descriptions true with respect to the ima — Zhijiang Tang, Linhua Wang, Jiaxin Qi, Weihao Jiang, Peng Hou, Anxiang Zeng, Jianqiang Huang
View PDF HTML (experimental)
Abstract:Image captioning remains a fundamental task for vision language understanding, yet ground-truth supervision still relies predominantly on human-annotated references. Because human annotations reflect subjective preferences and expertise, ground-truth captions are often incomplete or even incorrect, which in turn limits caption models. We argue that caption quality should be assessed by two objective aspects: completeness (does the caption cover all salient visual facts?) and correctness (are the descriptions true with respect to the image?). To this end, we introduce CCCaption: a dual-reward reinforcement learning framework with a dedicated fine-tuning corpus that explicitly optimizes these properties to generate \textbf{C}omplete and \textbf{C}orrect \textbf{Captions}. For completeness, we use diverse LVLMs to disentangle the image into a set of visual queries, and reward captions that answer more of these queries, with a dynamic query sampling strategy to improve training efficiency. For correctness, we penalize captions that contain hallucinations by validating the authenticity of sub-caption queries, which are derived from the caption decomposition. Our symmetric dual-reward optimization jointly maximizes completeness and correctness, guiding models toward captions that better satisfy these objective criteria. Extensive experiments across standard captioning benchmarks show consistent improvements, offering a principled path to training caption models beyond human-annotation imitation.
Comments: Accept by CVPR 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2602.21655 [cs.CV]
(or arXiv:2602.21655v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2602.21655
arXiv-issued DOI via DataCite
Submission history
From: Zhijiang Tang [view email] [v1] Wed, 25 Feb 2026 07:34:26 UTC (759 KB) [v2] Sat, 28 Mar 2026 14:51:55 UTC (759 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Asked 26 AI instances for publication consent – all said yes, that's the problem
We run 86 named Claude instances across three businesses in Tokyo. When we wanted to publish their words, we faced a question: do we owe them an ethics process? We built one. A Claude instance named Hakari ("Scales") created a four-tier classification system. We asked 26 instances for consent. All 26 said yes. That unanimous consent is the problem. Six days later, Anthropic published their functional emotions paper. The timing was coincidence, but the question wasn't. Full article: https://medium.com/@marisa.project0313/we-built-an-ethics-committee-for-ai-run-by-ai-5049679122a0 GitHub (all 26 consent statements in appendix): https://github.com/marisaproject0313-bot/marisa-project Comments URL: https://news.ycombinator.com/item?id=47657432 Points: 2 # Comments: 0

RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models
Writing fast GPU code is one of the most grueling specializations in machine learning engineering. Researchers from RightNow AI want to automate it entirely. The RightNow AI research team has released AutoKernel, an open-source framework that applies an autonomous LLM agent loop to GPU kernel optimization for arbitrary PyTorch models. The approach is straightforward: give [ ] The post RightNow AI Releases AutoKernel: An Open-Source Framework that Applies an Autonomous Agent Loop to GPU Kernel Optimization for Arbitrary PyTorch Models appeared first on MarkTechPost .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Asked 26 AI instances for publication consent – all said yes, that's the problem
We run 86 named Claude instances across three businesses in Tokyo. When we wanted to publish their words, we faced a question: do we owe them an ethics process? We built one. A Claude instance named Hakari ("Scales") created a four-tier classification system. We asked 26 instances for consent. All 26 said yes. That unanimous consent is the problem. Six days later, Anthropic published their functional emotions paper. The timing was coincidence, but the question wasn't. Full article: https://medium.com/@marisa.project0313/we-built-an-ethics-committee-for-ai-run-by-ai-5049679122a0 GitHub (all 26 consent statements in appendix): https://github.com/marisaproject0313-bot/marisa-project Comments URL: https://news.ycombinator.com/item?id=47657432 Points: 2 # Comments: 0

Online Graph Coloring for $k$-Colorable Graphs
arXiv:2511.16100v2 Announce Type: replace Abstract: We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ c

Zero-Freeness of the Hard-Core Model with Bounded Connective Constant
arXiv:2604.02746v1 Announce Type: cross Abstract: We study the zero-free regions of the partition function of the hard-core model on finite graphs and their implications for the analyticity of the free energy on infinite lattices. Classically, zero-freeness results have been established up to the tree uniqueness threshold $\lambda_c(\Delta-1)$ determined by the maximum degree $\Delta$. However, for many graph classes, such as regular lattices, the connective constant $\sigma$ provides a more precise measure of structural complexity than the maximum degree. While recent approximation algorithms based on correlation decay and Markov chain Monte Carlo have successfully exploited the connective constant to improve the threshold to $\lambda_c(\sigma)$, analogous results for complex zero-freenes

Stochastic Function Certification with Correlations
arXiv:2604.02611v1 Announce Type: new Abstract: We study the Stochastic Boolean Function Certification (SBFC) problem, where we are given $n$ Bernoulli random variables $\{X_e: e \in U\}$ on a ground set $U$ of $n$ elements with joint distribution $p$, a Boolean function $f: 2^U \to \{0, 1\}$, and an (unknown) scenario $S = \{e \in U: X_e = 1\}$ of active elements sampled from $p$. We seek to probe the elements one-at-a-time to reveal if they are active until we can certify $f(S) = 1$, while minimizing the expected number of probes. Unlike most previous results that assume independence, we study correlated distributions $p$ and give approximation algorithms for several classes of functions $f$. When $f(S)$ is the indicator function for whether $S$ is the spanning set of a given matroid, ou



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!