Research Papers research paper arxiv ai artificial-intelligence

Good Scores, Bad Data: A Metric for Multimodal Coherence

arXivMarch 30, 202610 min read0 views

arXiv:2603.25924v1 Announce Type: cross Abstract: Multimodal AI systems are evaluated by downstream task accuracy, but high accuracy does not mean the underlying data is coherent. A model can score well on Visual Question Answering (VQA) while its inputs contradict each other. We introduce the Multimodal Coherence Score (MCS), a metric that evaluates fusion quality independent of any downstream model. MCS decomposes coherence into four dimensions, identity, spatial, semantic, and decision, with weights learned via Nelder-Mead optimization. We evaluate on 1,000 Visual Genome images using DETR, — Vasundra Srinivasan

View PDF HTML (experimental)

Abstract:Multimodal AI systems are evaluated by downstream task accuracy, but high accuracy does not mean the underlying data is coherent. A model can score well on Visual Question Answering (VQA) while its inputs contradict each other. We introduce the Multimodal Coherence Score (MCS), a metric that evaluates fusion quality independent of any downstream model. MCS decomposes coherence into four dimensions, identity, spatial, semantic, and decision, with weights learned via Nelder-Mead optimization. We evaluate on 1,000 Visual Genome images using DETR, CLIP, and ViLT, and validate on 150 COCO images with no retraining. Across three fusion architectures, MCS discriminates quality with higher sensitivity than task accuracy alone (Spearman rho = 0.093 vs. 0.071). Perturbation experiments confirm each dimension responds independently to its failure mode with zero cross-talk. MCS is lightweight, requires no human annotation, and tells you not just that something broke, but what broke.

Comments: 9 pages, 6 figures, NeurIPS 2024 format

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.25924 [cs.CV]

(or arXiv:2603.25924v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25924

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Vasundra Srinivasan [view email] [v1] Thu, 26 Mar 2026 21:30:34 UTC (2,259 KB)

Original source

arXiv

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsRecent

Entropy-Preserving Reinforcement Learning

Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy—and thus the diversity of explored trajectories—as part of training, yielding a policy increasingly limited in its ability to explore. In this paper, we argue that entropy should be actively monitored and controlled throughout training. We formally analyze the…

Apple Machine Learning

1m1 day ago

Analyst News

AI probably does lead to more computer security disasters

Anecdotes abound of people losing data after trusting a chatbot to look after their computer. But does that constitute a trend? And is AI to blame, or or are those who blindly trust chatbots simply the sort of people who would have done something foolish anyway? More research is needed, but there is a strong case to be made that AI is, at least partly, making matters worse.

AlgorithmWatch

1m11 days ago

Products

The coalescent architecture of agency : normative directionality as the key to human–AI integration

This paper advances the notion of coalescent agency as a framework for understanding human–AI integration, thereby entering ongoing debates about machine agency, extended cognition, and AI governance. I argue that the persistence or erosion of human agency in human–AI systems can be predicted through four operational criteria constituting normative directionality : domain understanding, critical evaluation capacity, override authority, and responsibility attribution. Drawing on segmented ontology and predictive processing theory, I distinguish material-segment mechanisms (AI computational processing) from social-segment mechanisms (human normative practices) while showing how these heterogeneous structures can coordinate productively. The framework’s central prediction—that automation bias

AI & Society Journal

1m13 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 191 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Data-centric AI governance for responsible organizational value: evidence from a European public administration

This paper explores how data-centric artificial intelligence governance frameworks enable responsible organizational value creation within complex institutional environments. Using an empirical case from a European public administration, it examines the implementation of an automated legislative monitoring system designed to detect, classify, and summarize regulatory information. The study highlights the shift from model-centric experimentation to a mature data governance and Machine Learning Operations (MLOps) framework, integrating continuous human oversight and ethical accountability. A qualitative case study, DGOBCAN-AI, was employed, combining technical documentation, process observation, and organizational evaluation. The system evolved from a basic extract–transform–load (ETL) scrip

AI & Society Journal

2m13 days ago

Research Papers

The pipeline exquis: a critical coding exercise to re-enact ML practice

Narratives which present smart algorithms as the major driver behind the successes of machine learning (ML) systems and ideas of automating ML development fail to acknowledge the contributions made by developers which are not directly reflected in functional code, but ground ML systems in reality. In line with ethnographic studies highlighting the importance of human collaboration and sensemaking in ML practices, we present an exercise which allows us to reflect on the consequences of reducing this dimension (Passi and Jackson in Proc ACM Hum-Comput Interact 2(CSCW):136:1–28, 2018; Neff et al. in Big Data 5(2): 85–97, 2017; Zhang et al. Proc ACM Human-Comput Interact 4(CSCW1):1–23, 2020; Muller et al. in: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, New Yor

AI & Society Journal

2m9 days ago

Research Papers

Artificial intelligence as a moral mediator: emotional reciprocity driving happiness in hospitality

Artificial intelligence (AI) in hospitality is often portrayed as a cold, efficiency-focused tool, overlooking its potential to mediate emotional and ethical dynamics in the workplace. This study addresses the problem of how AI can ethically regulate emotional labor without dehumanizing work, and how emotional reciprocity contributes to workplace happiness. Using a quantitative, multigroup survey methodology, data were collected from 754 hospitality employees and 42 managers across hotels in Spain. Structural equation modeling examined the mediating role of AI-mediated emotional reciprocity (AI-MER) between emotional labor sustainability (ELS), shared prosperity (SP), human-centered leadership, and workplace happiness. Findings reveal that ELS is a foundational anchor enabling AI to mediat

AI & Society Journal

1m8 days ago

Research Papers

Unpacking the message: visual cues to reduce bystander uncertainty about delivery drones in public spaces

As drones are deployed in public spaces for tasks such as package delivery, drones will encounter the public as bystanders passing by. The distinctive character of bystanders is that they are not the package recipients, so they lack prior information about the drone. Clear communication of drone intentions is essential to reduce uncertainty and improve public safety and trust. Limited research, however, has examined how a drone’s communication strategies affect bystanders. This online questionnaire study investigated how a drone’s visual cues affect bystanders' uncertainty about a drone’s intentions. Participants ( N = 150) viewed software simulated scenarios of drones delivering packages either by landing or by cable drop, each with or without visual interfaces (on-board lights, on-board

AI & Society Journal

2m4 days ago