Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AISame Prompt. Different Answers Every Time. Here's How I Fixed It.Dev.to AICan AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global InvestorsDev.to AIYour Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation TestingDev.to AII Broke My Multi-Agent Pipeline on Purpose. All 3 Failures Were Silent.Dev.to AIUnlock Blog Growth: Implement Structured Data for Blogs Now!Dev.to AIWhat is Algorithmic Trading, and Why is it the Silent Force Behind Today's Market Volatility?Dev.to AIЯ уволил отдел и нанял одного AI-агентаDev.to AIIssue #23: Day 15 — The Newsletter Finally Has a Subscriber System (And How It Works)Dev.to AIMigrating from Ralph Loops to duckfluxDev.to AIMusk Announced a $25B Chip Factory That Nvidia’s CEO Says Is “Impossible.”Medium AIGoogle Paid $2.7 Billion to Rehire Someone It Let Walk Out the Door. Read That Again.Medium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AISame Prompt. Different Answers Every Time. Here's How I Fixed It.Dev.to AICan AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global InvestorsDev.to AIYour Go Tests Pass, But Do They Actually Test Anything? An Introduction to Mutation TestingDev.to AII Broke My Multi-Agent Pipeline on Purpose. All 3 Failures Were Silent.Dev.to AIUnlock Blog Growth: Implement Structured Data for Blogs Now!Dev.to AIWhat is Algorithmic Trading, and Why is it the Silent Force Behind Today's Market Volatility?Dev.to AIЯ уволил отдел и нанял одного AI-агентаDev.to AIIssue #23: Day 15 — The Newsletter Finally Has a Subscriber System (And How It Works)Dev.to AIMigrating from Ralph Loops to duckfluxDev.to AIMusk Announced a $25B Chip Factory That Nvidia’s CEO Says Is “Impossible.”Medium AIGoogle Paid $2.7 Billion to Rehire Someone It Let Walk Out the Door. Read That Again.Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Omni-NegCLIP: Enhancing CLIP with Front-Layer Contrastive Fine-Tuning for Comprehensive Negation Understanding

arXiv cs.CVby Jingqi XuApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.29258v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common in natural language. In this work, we propose Omni-NegCLIP, a fine-tuned CLIP model that improves CLIP's understanding of two types of negation, namely presence-based negation and absence-based negation, which correspond to negated expressions of objects that are actually present in an image and those that may plausibly exist in an image but are in fact absent, respectively, by modifying CLIP's original InfoNCE contrastive loss. Specifically, we design a presence-based contrastive objective that pulls image

View PDF HTML (experimental)

Abstract:Vision-Language Models (VLMs) have demonstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common in natural language. In this work, we propose Omni-NegCLIP, a fine-tuned CLIP model that improves CLIP's understanding of two types of negation, namely presence-based negation and absence-based negation, which correspond to negated expressions of objects that are actually present in an image and those that may plausibly exist in an image but are in fact absent, respectively, by modifying CLIP's original InfoNCE contrastive loss. Specifically, we design a presence-based contrastive objective that pulls image embeddings closer to their original caption embeddings while pushing them away from the corresponding presence-based negated caption embeddings, and an absence-based contrastive objective that aligns image embeddings with both original and absence-based negated caption embeddings while maintaining a semantic distinction between the two text embeddings. Based on our observation that the front transformer layers of CLIP text encoder have stronger learning ability for negated text than the later layers, we fine-tune the front transformer layers of the CLIP text encoder at each training step using the combined contrastive objective. Experimental results show that, compared with pretrained CLIP, Omni-NegCLIP improves performance on presence-based negation and absence-based negation tasks by up to 52.65% and 12.50%, respectively, without sacrificing general capability in image-text retrieval and even improving it by up to 19.62%. Compared with prior works, Omni-NegCLIP demonstrates a more comprehensive ability to understand multiple types of negation tasks.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29258 [cs.CV]

(or arXiv:2603.29258v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29258

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jingqi Xu [view email] [v1] Tue, 31 Mar 2026 04:48:52 UTC (504 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltransformer

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Omni-NegCLI…modellanguage mo…transformertrainingannouncemultimodalarXiv cs.CV

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 186 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models

Я уволил отдел и нанял одного AI-агента
ModelsLive