Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessciflow/trunk/177707PyTorch ReleasesShow HN: Vibooks – Local-first bookkeeping software built for AI agentsHacker News AI Topciflow/torchtitan/179381: Update on "[wip][dynamo] Reduce special casing for namedtuple objects"PyTorch Releasesciflow/trunk/179003: Thread compile_region_name through AOTAutograd cache hit pathPyTorch ReleasesOne year ago DeepSeek R1 was 25 times bigger than Gemma 4Reddit r/LocalLLaMACrack ML Interviews with Confidence: ML Model Development (20 Q&A)Towards AILooking for smallest VLM for NSFW image detector (atleast 5 it/s on CPU)Reddit r/LocalLLaMACoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMAI Gave Claude Access to My Desktop Outlook Without Touching the Microsoft APITowards AIHermes agent might be the best open source agent for local models right nowReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsBlack Hat USADark ReadingBlack Hat AsiaAI Businessciflow/trunk/177707PyTorch ReleasesShow HN: Vibooks – Local-first bookkeeping software built for AI agentsHacker News AI Topciflow/torchtitan/179381: Update on "[wip][dynamo] Reduce special casing for namedtuple objects"PyTorch Releasesciflow/trunk/179003: Thread compile_region_name through AOTAutograd cache hit pathPyTorch ReleasesOne year ago DeepSeek R1 was 25 times bigger than Gemma 4Reddit r/LocalLLaMACrack ML Interviews with Confidence: ML Model Development (20 Q&A)Towards AILooking for smallest VLM for NSFW image detector (atleast 5 it/s on CPU)Reddit r/LocalLLaMACoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMAI Gave Claude Access to My Desktop Outlook Without Touching the Microsoft APITowards AIHermes agent might be the best open source agent for local models right nowReddit r/LocalLLaMABanning All Anthropic EmployeesHacker News
AI NEWS HUBbyEIGENVECTOREigenvector

The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models

arXivMarch 31, 20262 min read1 views
Source Quiz

arXiv:2603.27139v1 Announce Type: new Abstract: Fine-tuning approaches for Vision-Language Models (VLMs) face a critical three-way trade-off between In-Distribution (ID) accuracy, Out-of-Distribution (OOD) generalization, and adversarial robustness. Existing robust fine-tuning strategies resolve at most two axes of this trade-off. Generalization-preserving methods retain ID/OOD performance but leave models vulnerable to adversarial attacks, while adversarial training improves robustness to targeted attacks but degrades ID/OOD accuracy. Our key insight is that the robustness trade-off stems fro — Shivang Chopra, Shaunak Halbe, Chengyue Huan, Brisa Maneechotesuwan, Zsolt Kira

View PDF HTML (experimental)

Abstract:Fine-tuning approaches for Vision-Language Models (VLMs) face a critical three-way trade-off between In-Distribution (ID) accuracy, Out-of-Distribution (OOD) generalization, and adversarial robustness. Existing robust fine-tuning strategies resolve at most two axes of this trade-off. Generalization-preserving methods retain ID/OOD performance but leave models vulnerable to adversarial attacks, while adversarial training improves robustness to targeted attacks but degrades ID/OOD accuracy. Our key insight is that the robustness trade-off stems from two geometric failures: sharp, anisotropic minima in parameter space and unstable feature representations that deform under perturbation. To address this, we propose GRACE (Gram-aligned Robustness via Adaptive Curvature Estimation), a unified fine-tuning framework that jointly regularizes the parameter-space curvature and feature-space invariance for VLMs. Grounded in Robust PAC-Bayes theory, GRACE employs adaptive weight perturbations scaled by local curvature to promote flatter minima, combined with a feature alignment loss that maintains representation consistency across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models, GRACE simultaneously improves ID accuracy by 10.8%, and adversarial accuracy by 13.5% while maintaining 57.0% OOD accuracy (vs. 57.4% zero-shot baseline). Geometric analysis confirms that GRACE converges to flatter minima without feature distortion across distribution shifts, providing a principled step toward generalized robustness in foundation VLMs.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27139 [cs.CV]

(or arXiv:2603.27139v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27139

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shivang Chopra [view email] [v1] Sat, 28 Mar 2026 05:22:00 UTC (5,308 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
The Geometr…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 210 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers