Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessEnd of an era: Elon Musk says Tesla is no longer producing the Model S and XBusiness InsiderOpenAI's new partner wants to build ads that can chat with youBusiness InsiderQ1 2026 Shatters Venture Funding Records As AI Boom Pushes Startup Investment To Nearly $300BCrunchbase NewsMeet 'Dobby': The AI agent that could kill the app economyBusiness InsiderThis company is turning YouTube videos into TV shows as streamers chase Gen AlphaBusiness InsiderWhat to expect from WWDC 2026EngadgetThe gig workers who are training humanoid robots at homeMIT Technology Review AITech creators are getting the star treatment at a new talent management firmBusiness InsiderBaidu’s robotaxis froze in traffic creating chaosThe Verge AI9 companies that have done AI-related layoffsBusiness InsiderSlack's upgraded AI can analyze how you workEngadgetTown hall in Bay Ridge spotlights AI concerns in NYC public schools - BKReaderGoogle News: AI SafetyBlack Hat USADark ReadingBlack Hat AsiaAI BusinessEnd of an era: Elon Musk says Tesla is no longer producing the Model S and XBusiness InsiderOpenAI's new partner wants to build ads that can chat with youBusiness InsiderQ1 2026 Shatters Venture Funding Records As AI Boom Pushes Startup Investment To Nearly $300BCrunchbase NewsMeet 'Dobby': The AI agent that could kill the app economyBusiness InsiderThis company is turning YouTube videos into TV shows as streamers chase Gen AlphaBusiness InsiderWhat to expect from WWDC 2026EngadgetThe gig workers who are training humanoid robots at homeMIT Technology Review AITech creators are getting the star treatment at a new talent management firmBusiness InsiderBaidu’s robotaxis froze in traffic creating chaosThe Verge AI9 companies that have done AI-related layoffsBusiness InsiderSlack's upgraded AI can analyze how you workEngadgetTown hall in Bay Ridge spotlights AI concerns in NYC public schools - BKReaderGoogle News: AI Safety

When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models

arXivMarch 31, 20261 min read0 views
Source Quiz

arXiv:2603.27759v1 Announce Type: new Abstract: Visual-Language Models (VLMs) have demonstrated exceptional cross-modal understanding across various tasks, including zero-shot classification, image captioning, and visual question answering. However, their robustness to physically plausible non-rigid deformations-such as wrinkles on flexible surfaces-remains poorly understood. In this work, we propose a parametric structural perturbation method inspired by the mechanics of three-dimensional fabric wrinkles. Specifically, our method generates photorealistic non-rigid perturbations by constructin — Chengyin Hu, Xuemeng Sun, Jiajun Han, Qike Zhang, Xiang Chen, Xin Wang, Yiwei Wei, Jiahua Long

View PDF HTML (experimental)

Abstract:Visual-Language Models (VLMs) have demonstrated exceptional cross-modal understanding across various tasks, including zero-shot classification, image captioning, and visual question answering. However, their robustness to physically plausible non-rigid deformations-such as wrinkles on flexible surfaces-remains poorly understood. In this work, we propose a parametric structural perturbation method inspired by the mechanics of three-dimensional fabric wrinkles. Specifically, our method generates photorealistic non-rigid perturbations by constructing multi-scale wrinkle fields and integrating displacement field distortion with surface-consistent appearance variations. To achieve an optimal balance between visual naturalness and adversarial effectiveness, we design a hierarchical fitness function in a low-dimensional parameter space and employ an optimization-based search strategy. We evaluate our approach using a two-stage framework: perturbations are first optimized on a zero-shot classification proxy task and subsequently assessed for transferability on generative tasks. Experimental results demonstrate that our method significantly degrades the performance of various state-of-the-art VLMs, consistently outperforming baselines in both image captioning and visual question-answering tasks.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27759 [cs.CV]

(or arXiv:2603.27759v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27759

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xiang Chen [view email] [v1] Sun, 29 Mar 2026 16:35:18 UTC (6,762 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
When Surfac…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 174 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers