Good Scores, Bad Data: A Metric for Multimodal Coherence
arXiv:2603.25924v1 Announce Type: cross Abstract: Multimodal AI systems are evaluated by downstream task accuracy, but high accuracy does not mean the underlying data is coherent. A model can score well on Visual Question Answering (VQA) while its inputs contradict each other. We introduce the Multimodal Coherence Score (MCS), a metric that evaluates fusion quality independent of any downstream model. MCS decomposes coherence into four dimensions, identity, spatial, semantic, and decision, with weights learned via Nelder-Mead optimization. We evaluate on 1,000 Visual Genome images using DETR, — Vasundra Srinivasan
View PDF HTML (experimental)
Abstract:Multimodal AI systems are evaluated by downstream task accuracy, but high accuracy does not mean the underlying data is coherent. A model can score well on Visual Question Answering (VQA) while its inputs contradict each other. We introduce the Multimodal Coherence Score (MCS), a metric that evaluates fusion quality independent of any downstream model. MCS decomposes coherence into four dimensions, identity, spatial, semantic, and decision, with weights learned via Nelder-Mead optimization. We evaluate on 1,000 Visual Genome images using DETR, CLIP, and ViLT, and validate on 150 COCO images with no retraining. Across three fusion architectures, MCS discriminates quality with higher sensitivity than task accuracy alone (Spearman rho = 0.093 vs. 0.071). Perturbation experiments confirm each dimension responds independently to its failure mode with zero cross-talk. MCS is lightweight, requires no human annotation, and tells you not just that something broke, but what broke.
Comments: 9 pages, 6 figures, NeurIPS 2024 format
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.25924 [cs.CV]
(or arXiv:2603.25924v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.25924
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Vasundra Srinivasan [view email] [v1] Thu, 26 Mar 2026 21:30:34 UTC (2,259 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivEntropy-Preserving Reinforcement Learning
Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy—and thus the diversity of explored trajectories—as part of training, yielding a policy increasingly limited in its ability to explore. In this paper, we argue that entropy should be actively monitored and controlled throughout training. We formally analyze the…
AI probably does lead to more computer security disasters
Anecdotes abound of people losing data after trusting a chatbot to look after their computer. But does that constitute a trend? And is AI to blame, or or are those who blindly trust chatbots simply the sort of people who would have done something foolish anyway? More research is needed, but there is a strong case to be made that AI is, at least partly, making matters worse.
The coalescent architecture of agency : normative directionality as the key to human–AI integration
This paper advances the notion of coalescent agency as a framework for understanding human–AI integration, thereby entering ongoing debates about machine agency, extended cognition, and AI governance. I argue that the persistence or erosion of human agency in human–AI systems can be predicted through four operational criteria constituting normative directionality : domain understanding, critical evaluation capacity, override authority, and responsibility attribution. Drawing on segmented ontology and predictive processing theory, I distinguish material-segment mechanisms (AI computational processing) from social-segment mechanisms (human normative practices) while showing how these heterogeneous structures can coordinate productively. The framework’s central prediction—that automation bias
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Data-centric AI governance for responsible organizational value: evidence from a European public administration
This paper explores how data-centric artificial intelligence governance frameworks enable responsible organizational value creation within complex institutional environments. Using an empirical case from a European public administration, it examines the implementation of an automated legislative monitoring system designed to detect, classify, and summarize regulatory information. The study highlights the shift from model-centric experimentation to a mature data governance and Machine Learning Operations (MLOps) framework, integrating continuous human oversight and ethical accountability. A qualitative case study, DGOBCAN-AI, was employed, combining technical documentation, process observation, and organizational evaluation. The system evolved from a basic extract–transform–load (ETL) scrip
The pipeline exquis: a critical coding exercise to re-enact ML practice
Narratives which present smart algorithms as the major driver behind the successes of machine learning (ML) systems and ideas of automating ML development fail to acknowledge the contributions made by developers which are not directly reflected in functional code, but ground ML systems in reality. In line with ethnographic studies highlighting the importance of human collaboration and sensemaking in ML practices, we present an exercise which allows us to reflect on the consequences of reducing this dimension (Passi and Jackson in Proc ACM Hum-Comput Interact 2(CSCW):136:1–28, 2018; Neff et al. in Big Data 5(2): 85–97, 2017; Zhang et al. Proc ACM Human-Comput Interact 4(CSCW1):1–23, 2020; Muller et al. in: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, New Yor
Artificial intelligence as a moral mediator: emotional reciprocity driving happiness in hospitality
Artificial intelligence (AI) in hospitality is often portrayed as a cold, efficiency-focused tool, overlooking its potential to mediate emotional and ethical dynamics in the workplace. This study addresses the problem of how AI can ethically regulate emotional labor without dehumanizing work, and how emotional reciprocity contributes to workplace happiness. Using a quantitative, multigroup survey methodology, data were collected from 754 hospitality employees and 42 managers across hotels in Spain. Structural equation modeling examined the mediating role of AI-mediated emotional reciprocity (AI-MER) between emotional labor sustainability (ELS), shared prosperity (SP), human-centered leadership, and workplace happiness. Findings reveal that ELS is a foundational anchor enabling AI to mediat
Unpacking the message: visual cues to reduce bystander uncertainty about delivery drones in public spaces
As drones are deployed in public spaces for tasks such as package delivery, drones will encounter the public as bystanders passing by. The distinctive character of bystanders is that they are not the package recipients, so they lack prior information about the drone. Clear communication of drone intentions is essential to reduce uncertainty and improve public safety and trust. Limited research, however, has examined how a drone’s communication strategies affect bystanders. This online questionnaire study investigated how a drone’s visual cues affect bystanders' uncertainty about a drone’s intentions. Participants ( N = 150) viewed software simulated scenarios of drones delivering packages either by landing or by cable drop, each with or without visual interfaces (on-board lights, on-board

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!