Research Papers research paper arxiv computer-vision image-recognition

Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models

arXivMarch 31, 20262 min read0 views

arXiv:2603.27201v1 Announce Type: new Abstract: Multimodal Chain-of-Thought (MCoT) models have demonstrated impressive capability in complex visual reasoning tasks. Unfortunately, recent studies reveal that they suffer from severe hallucination problems due to diminished visual attention during the generation process. However, visual attention decay is a well-studied problem in Large Vision-Language Models (LVLMs). Considering the fundamental differences in reasoning processes between MCoT models and traditional LVLMs, we raise a basic question: Whether MCoT models have unique causes of halluc — Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

View PDF HTML (experimental)

Abstract:Multimodal Chain-of-Thought (MCoT) models have demonstrated impressive capability in complex visual reasoning tasks. Unfortunately, recent studies reveal that they suffer from severe hallucination problems due to diminished visual attention during the generation process. However, visual attention decay is a well-studied problem in Large Vision-Language Models (LVLMs). Considering the fundamental differences in reasoning processes between MCoT models and traditional LVLMs, we raise a basic question: Whether MCoT models have unique causes of hallucinations? To answer this question, we systematically investigate the hallucination patterns of MCoT models and find that fabricated texts are primarily generated in associative reasoning steps, which we term divergent thinking. Leveraging these insights, we introduce a simple yet effective strategy that can effectively localize divergent thinking steps and intervene in the decoding process to mitigate hallucinations. Extensive experiments show that our method outperforms existing methods by a large margin. More importantly, our proposed method can be conveniently integrated with other hallucination mitigation methods and further boost their performance. The code is publicly available at this https URL.

Comments: CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27201 [cs.CV]

(or arXiv:2603.27201v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27201

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ji Ma [view email] [v1] Sat, 28 Mar 2026 08:56:19 UTC (4,378 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27201

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsFresh

Speech LLMs are Contextual Reasoning Transcribers

arXiv:2604.00610v1 Announce Type: new Abstract: Despite extensions to speech inputs, effectively leveraging the rich knowledge and contextual understanding of large language models (LLMs) in automatic speech recognition (ASR) remains non-trivial, as the task primarily involves direct speech-to-text mapping. To address this, this paper proposes chain-of-thought ASR (CoT-ASR), which constructs a reasoning chain that enables LLMs to first analyze the input speech and generate contextual analysis, thereby fully exploiting their generative capabilities. With this contextual reasoning, CoT-ASR then performs more informed speech recognition and completes both reasoning and transcription in a single pass. Moreover, CoT-ASR naturally supports user-guided transcription: while designed to self-genera

arXiv cs.CL

1mabout 4 hours ago

ModelsRecent

[P] Federated Adversarial Learning

I'm a CS/ML engineering student in my 4th year, and I need help for a project I recently got assigned to (as an "end of the year" project). I am familiar with basic ML stuff, deep learning etc and made a few "standard" projects here and there about it... However I found this topic a bit challenging since it combines both FL and the adversarial aspect, I did a lot of research especially on arxiv to try to understand the gist of it. HOWEVER, the subject is essentially "federated adversarial learning" and I am struggeling to understand what I'm supposed to do. (I found ONE article on arxiv but ngl i find it very hard to understand as it is very theoritical.) I talked to my teachers/supervisors about this but they said "do whatever you want" which doesn't help AT ALL..... They did provide a da

Reddit r/MachineLearning

2mabout 17 hours ago

ModelsRecent

[R] The SPORE Clustering Algorithm

https://preview.redd.it/di99yw56tksg1.png?width=992 format=png auto=webp s=8828c9459dcf8f8541718e4d7a9fae52bfc0b95a I created a clustering algorithm SPORE ( S keleton P ropagation O ver R ecalibrating E xpansions) for general purpose clustering, intended to handle nonconvex, convex, low-d and high-d data alike. I've benchmarked it on 28 datasets from 2-784D and released a Python package as well as a research paper . Short Summary SPORE is a density-variance-based method meant for general clustering in arbitrary geometries and dimensionalities. After building a knn graph, it has 2 phases. Phase 1 (Expansion) uses BFS with a continually refined density-variance constraint to expand initial clusters in a way that adapts to their specific scale. The aim is to capture inner, well-shielded skele

Reddit r/MachineLearning

5mabout 19 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 301 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersRecent

[D] Does seeing the identify of authors influence your scoring?

Let's be honest, at some stage of the review process. A lot of us have gotten bored and tried to Google the papers we are reviewing. And sometimes those papers might have already been uploaded onto arXiv with the identity of the authors. Which we then tried to look them up. As a first-time reviewer, I noticed the top 2 papers in my batch happened to be the only papers in my batch that is on arXiv. I am trying to work out if revealing the author's identity had influenced my decision. Or it's just a coincidence. submitted by /u/d_edge_sword [link] [comments]

Reddit r/MachineLearning

1mabout 20 hours ago

Research PapersFresh

Leveraging Commit Size Context and Hyper Co-Change Graph Centralities for Defect Prediction

arXiv:2604.01132v1 Announce Type: new Abstract: File-level defect prediction models traditionally rely on product and process metrics. While process metrics effectively complement product metrics, they often overlook commit size the number of files changed per commit despite its strong association with software quality. Network centrality measures on dependency graphs have also proven to be valuable product level indicators. Motivated by this, we first redefine process metrics as commit size aware process metric vectors, transforming conventional scalar measures into 100 dimensional profiles that capture the distribution of changes across commit size strata. We then model change history as a hyper co change graph, where hyperedges naturally encode commit-size semantics. Vector centralities

arXiv cs.SE

1mabout 4 hours ago

Research PapersFresh

Detecting Call Graph Unsoundness without Ground Truth

arXiv:2604.00885v1 Announce Type: new Abstract: Java static analysis frameworks are commonly compared under the assumption that analysis algorithms and configurations compose monotonically and yield semantically comparable results across tools. In this work, we show that this assumption is fundamentally flawed. We present a large-scale empirical study of semantic consistency within and across four widely used Java static analysis frameworks: Soot, SootUp, WALA, and Doop. Using precision partial orders over analysis algorithms and configurations, we systematically identify violations where increased precision introduces new call-graph edges or amplifies inconsistencies. Our results reveal three key findings. First, algorithmic precision orders frequently break within frameworks due to moder

arXiv cs.SE

1mabout 4 hours ago

Research PapersFresh

Containing the Reproducibility Gap: Automated Repository-Level Containerization for Scholarly Jupyter Notebooks

arXiv:2604.01072v1 Announce Type: new Abstract: Computational reproducibility is fundamental to trustworthy science, yet remains difficult to achieve in practice across various research workflows, including Jupyter notebooks published alongside scholarly articles. Environment drift, undocumented dependencies and implicit execution assumptions frequently prevent independent re-execution of published research. Despite existing reproducibility guidelines, scalable and systematic infrastructure for automated assessment remains limited. We present an automated, web-oriented reproducibility engineering pipeline that reconstructs and evaluates repository-level execution environments for scholarly notebooks. The system performs dependency inference, automated container generation, and isolated exe

arXiv cs.SE

2mabout 4 hours ago