Models model language model transformer training announce multimodal

Omni-NegCLIP: Enhancing CLIP with Front-Layer Contrastive Fine-Tuning for Comprehensive Negation Understanding

arXiv cs.CVby Jingqi XuApril 1, 20262 min read0 views

arXiv:2603.29258v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common in natural language. In this work, we propose Omni-NegCLIP, a fine-tuned CLIP model that improves CLIP's understanding of two types of negation, namely presence-based negation and absence-based negation, which correspond to negated expressions of objects that are actually present in an image and those that may plausibly exist in an image but are in fact absent, respectively, by modifying CLIP's original InfoNCE contrastive loss. Specifically, we design a presence-based contrastive objective that pulls image

View PDF HTML (experimental)

Abstract:Vision-Language Models (VLMs) have demonstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common in natural language. In this work, we propose Omni-NegCLIP, a fine-tuned CLIP model that improves CLIP's understanding of two types of negation, namely presence-based negation and absence-based negation, which correspond to negated expressions of objects that are actually present in an image and those that may plausibly exist in an image but are in fact absent, respectively, by modifying CLIP's original InfoNCE contrastive loss. Specifically, we design a presence-based contrastive objective that pulls image embeddings closer to their original caption embeddings while pushing them away from the corresponding presence-based negated caption embeddings, and an absence-based contrastive objective that aligns image embeddings with both original and absence-based negated caption embeddings while maintaining a semantic distinction between the two text embeddings. Based on our observation that the front transformer layers of CLIP text encoder have stronger learning ability for negated text than the later layers, we fine-tune the front transformer layers of the CLIP text encoder at each training step using the combined contrastive objective. Experimental results show that, compared with pretrained CLIP, Omni-NegCLIP improves performance on presence-based negation and absence-based negation tasks by up to 52.65% and 12.50%, respectively, without sacrificing general capability in image-text retrieval and even improving it by up to 19.62%. Compared with prior works, Omni-NegCLIP demonstrates a more comprehensive ability to understand multiple types of negation tasks.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29258 [cs.CV]

(or arXiv:2603.29258v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29258

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jingqi Xu [view email] [v1] Tue, 31 Mar 2026 04:48:52 UTC (504 KB)

Original source

arXiv cs.CV

https://arxiv.org/abs/2603.29258

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltransformer

ReleasesLive

Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents and Tool Use

The landscape of open-source artificial intelligence has shifted from purely generative models toward systems capable of complex, multi-step reasoning. While proprietary reasoning models have dominated the conversation, Arcee AI has released Trinity Large Thinking. This release is an open-weight reasoning model distributed under the Apache 2.0 license, positioning it as a transparent alternative for developers [ ] The post Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents and Tool Use appeared first on MarkTechPost .

MarkTechPost

1m26 minutes ago

ModelsLive

Migrating from Ralph Loops to duckflux

If you've been running coding agent tasks inside Ralph Loops , you already understand the core insight: iteration beats perfection. You've seen what happens when you hand a well-written prompt to an AI agent and let it grind until the job is done. This guide shows how to take that same philosophy and express it as a declarative, reproducible workflow in duckflux. You gain structure, observability, and composability without giving up the power of iterative automation. What are Ralph Loops? Ralph Wiggum is an iterative AI development methodology built on a deceptively simple idea: feed a prompt to a coding agent in a loop until the task is complete. Named after the Simpsons character (who stumbles forward until he accidentally succeeds), the technique treats failures as data points and bets

Dev.to AI

7m16 minutes ago

ProductsLive

What is Algorithmic Trading, and Why is it the Silent Force Behind Today's Market Volatility?

What is Algorithmic Trading, and Why is it the Silent Force Behind Today's Market Volatility? Algorithmic trading is a method of executing orders using automated, pre-programmed trading instructions that account for variables such as time, price, and volume. It is the silent force behind today's market volatility because these algorithms, often powered by AI, can react to market events and execute trades at speeds far beyond human capability, creating rapid price swings and influencing liquidity across global exchanges. This phenomenon is particularly relevant NOW as markets grapple with inflation, interest rate hikes, and geopolitical tensions, making algorithmic reactions a significant factor in daily market movements. Understanding Algorithmic Trading: The Core Idea At its heart, algori

Dev.to AI

6m15 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 186 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Fears Over U.S. AI Dominance Boost Business for France’s Mistral - WSJ

Fears Over U.S. AI Dominance Boost Business for France’s Mistral WSJ

Google News - Mistral AI France

1m10 months ago

Models

Accenture and Mistral AI join forces for sovereign AI in Europe - Consultancy.eu

Accenture and Mistral AI join forces for sovereign AI in Europe Consultancy.eu

Google News - Mistral AI France

1mabout 1 month ago

ModelsLive

Migrating from Ralph Loops to duckflux

Dev.to AI

7m16 minutes ago

ModelsLive

Я уволил отдел и нанял одного AI-агента

Когда я сказал, что уволю весь отдел, многие подумали, что это шутка. Но через месяц я оказался одним из первых в Киеве, кто доверил бизнес одному AI-агенту. Секрет оказался прост - автоматизация бизнеса с помощью AI. Отдел из пяти человек занимался обработкой заявок, отвечал клиентам, составлял отчёты и следил за воронкой продаж. На бумаге всё выглядело хорошо, но на практике работа была полна дублирования, ошибок и задержек. Человеческий фактор и 8-часовой рабочий день против 24/7 работы AI - разница была очевидна. Как только стоимость ошибок превысила зарплаты, стало ясно, что пора что-то менять. Я собрал автоматизация бизнеса с помощью AI-промпты в PDF. Забери бесплатно в Telegram (в закрепе): https://t.me/yevheniirozov Я собрал AI-агента на базе GPT-4 и Claude API, интегрировал его с

Dev.to AI

2m15 minutes ago