Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business‘I’m not dumb’: Hong Kong’s London trade office manager denies running spy networkSCMP Tech (Asia AI)ciflow/torchtitan/178947: Update on "add API to check if a tensor is symm-mem-tensor"PyTorch ReleasesGoogle Panda Algorithm: Understanding Its Impact and How to Recover from Its ConsequencesDev.to AIComplete Guide to llm-d CNCF Sandbox — Kubernetes-Native Distributed LLM InferenceDev.to AIciflow/trunk/178016: simplify testPyTorch Releasesciflow/torchtitan/178016: simplify testPyTorch ReleasesI Built an AI Coloring Page Generator — Got 500+ Organic Visits in One DayDev.to AIHeated Rivalry: A Guide to the Best Books, Movies, Video Games, and Podcasts for Fans of the Hit SeriesDev.to AIWe're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openlyDev.to AIciflow/torchtitan/177627: UpdatePyTorch Releasesciflow/torchtitan/177621: UpdatePyTorch Releasestrunk/d52b2f548aa3cfcfcd499fcba764fccf29628de6: [inductor] Enable precompiled headers in fbcode (#178870) (#178870)PyTorch ReleasesBlack Hat USAAI BusinessBlack Hat AsiaAI Business‘I’m not dumb’: Hong Kong’s London trade office manager denies running spy networkSCMP Tech (Asia AI)ciflow/torchtitan/178947: Update on "add API to check if a tensor is symm-mem-tensor"PyTorch ReleasesGoogle Panda Algorithm: Understanding Its Impact and How to Recover from Its ConsequencesDev.to AIComplete Guide to llm-d CNCF Sandbox — Kubernetes-Native Distributed LLM InferenceDev.to AIciflow/trunk/178016: simplify testPyTorch Releasesciflow/torchtitan/178016: simplify testPyTorch ReleasesI Built an AI Coloring Page Generator — Got 500+ Organic Visits in One DayDev.to AIHeated Rivalry: A Guide to the Best Books, Movies, Video Games, and Podcasts for Fans of the Hit SeriesDev.to AIWe're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openlyDev.to AIciflow/torchtitan/177627: UpdatePyTorch Releasesciflow/torchtitan/177621: UpdatePyTorch Releasestrunk/d52b2f548aa3cfcfcd499fcba764fccf29628de6: [inductor] Enable precompiled headers in fbcode (#178870) (#178870)PyTorch Releases

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.03192v2 Announce Type: replace-cross Abstract: Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptible to cross-modal hallucinations arising from spurious correlations and dominant language priors. In this work, we propose Modality-Decoupled Direct Preference Optimization (MoD-DPO), a simple and effective framework for improving modality grounding in omni LLMs. MoD-DPO introduces modality-aware regularization terms that explicitly enforce invariance to corruptions in irrelevant — Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani

View PDF HTML (experimental)

Abstract:Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptible to cross-modal hallucinations arising from spurious correlations and dominant language priors. In this work, we propose Modality-Decoupled Direct Preference Optimization (MoD-DPO), a simple and effective framework for improving modality grounding in omni LLMs. MoD-DPO introduces modality-aware regularization terms that explicitly enforce invariance to corruptions in irrelevant modalities and sensitivity to perturbations in relevant modalities, thereby reducing unintended cross-modal interactions. To further mitigate over-reliance on textual priors, we incorporate a language-prior debiasing penalty that discourages hallucination-prone text-only responses. Extensive experiments across multiple audiovisual hallucination benchmarks demonstrate that MoD-DPO consistently improves perception accuracy and hallucination resistance, outperforming previous preference optimization baselines under similar training budgets. Our findings underscore the importance of modality-faithful alignment and demonstrate a scalable path toward more reliable and resilient multimodal foundation models.

Comments: CVPR 2026. Project Page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as: arXiv:2603.03192 [cs.CV]

(or arXiv:2603.03192v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.03192

arXiv-issued DOI via DataCite

Submission history

From: Ashutosh Chaubey [view email] [v1] Tue, 3 Mar 2026 17:50:24 UTC (3,570 KB) [v2] Fri, 27 Mar 2026 20:27:35 UTC (3,571 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MoD-DPO: To…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 163 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers