Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessA Developer's Introduction to Generative AIDEV CommunityAI Coding Tip 014 - One AGENTS.md Is Hurting Your AI Coding AssistantHackernoon AIHow to Build a Multi-Model AI Architecture That ScalesMedium AIDay One of the Content Pipeline: What Broke and What I FixedDEV CommunityAI has arrived in auditing. Are regulators ready?Financial Times TechLet technology explore what the voters really wantFinancial Times TechAIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous SystemsArXiv CS.AIA Comprehensive Framework for Long-Term Resiliency Investment Planning under Extreme Weather Uncertainty for Electric UtilitiesArXiv CS.AII must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent CrimeArXiv CS.AICompetency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage StorytellingArXiv CS.AIInterpretable Deep Reinforcement Learning for Element-level Bridge Life-cycle OptimizationArXiv CS.AIDo Audio-Visual Large Language Models Really See and Hear?ArXiv CS.AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessA Developer's Introduction to Generative AIDEV CommunityAI Coding Tip 014 - One AGENTS.md Is Hurting Your AI Coding AssistantHackernoon AIHow to Build a Multi-Model AI Architecture That ScalesMedium AIDay One of the Content Pipeline: What Broke and What I FixedDEV CommunityAI has arrived in auditing. Are regulators ready?Financial Times TechLet technology explore what the voters really wantFinancial Times TechAIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous SystemsArXiv CS.AIA Comprehensive Framework for Long-Term Resiliency Investment Planning under Extreme Weather Uncertainty for Electric UtilitiesArXiv CS.AII must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent CrimeArXiv CS.AICompetency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage StorytellingArXiv CS.AIInterpretable Deep Reinforcement Learning for Element-level Bridge Life-cycle OptimizationArXiv CS.AIDo Audio-Visual Large Language Models Really See and Hear?ArXiv CS.AI
AI NEWS HUBbyEIGENVECTOREigenvector

GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

arXiv cs.CVby [Submitted on 31 Mar 2026]April 1, 20262 min read1 views
Source Quiz

arXiv:2603.29295v1 Announce Type: new Abstract: Current deepfake attribution or deepfake detection works tend to exhibit poor generalization to novel generative methods due to the limited exploration in visual modalities alone. They tend to assess the attribution or detection performance of models on unseen advanced generators, coarsely, and fail to consider the synergy of the two tasks. To this end, we propose a novel gaze-guided CLIP with adaptive-enhanced fine-grained language prompts for fine-grained deepfake attribution and detection (DFAD). Specifically, we conduct a novel and fine-grained benchmark to evaluate the DFAD performance of networks on novel generators like diffusion and flow models. Additionally, we introduce a gaze-aware model based on CLIP, which is devised to enhance t

View PDF HTML (experimental)

Abstract:Current deepfake attribution or deepfake detection works tend to exhibit poor generalization to novel generative methods due to the limited exploration in visual modalities alone. They tend to assess the attribution or detection performance of models on unseen advanced generators, coarsely, and fail to consider the synergy of the two tasks. To this end, we propose a novel gaze-guided CLIP with adaptive-enhanced fine-grained language prompts for fine-grained deepfake attribution and detection (DFAD). Specifically, we conduct a novel and fine-grained benchmark to evaluate the DFAD performance of networks on novel generators like diffusion and flow models. Additionally, we introduce a gaze-aware model based on CLIP, which is devised to enhance the generalization to unseen face forgery attacks. Built upon the novel observation that there are significant distribution differences between pristine and forged gaze vectors, and the preservation of the target gaze in facial images generated by GAN and diffusion varies significantly, we design a visual perception encoder to employ the inherent gaze differences to mine global forgery embeddings across appearance and gaze domains. We propose a gaze-aware image encoder (GIE) that fuses forgery gaze prompts extracted via a gaze encoder with common forged image embeddings to capture general attribution patterns, allowing features to be transformed into a more stable and common DFAD feature space. We build a language refinement encoder (LRE) to generate dynamically enhanced language embeddings via an adaptive-enhanced word selector for precise vision-language matching. Extensive experiments on our benchmark show that our model outperforms the state-of-the-art by 6.56% ACC and 5.32% AUC in average performance under the attribution and detection settings, respectively. Codes will be available on GitHub.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.29295 [cs.CV]

(or arXiv:2603.29295v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29295

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yaning Zhang [view email] [v1] Tue, 31 Mar 2026 05:59:59 UTC (12,562 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
GazeCLIP: G…modelbenchmarkannounceavailablefeatureglobalarXiv cs.CV

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 169 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!