defense 2026

The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models

Shivang Chopra , Shaunak Halbe , Chengyue Huan , Brisa Maneechotesuwan , Zsolt Kira

0 citations

α

Published on arXiv

2603.27139

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves 10.8% ID accuracy improvement and 13.5% adversarial accuracy improvement while maintaining 57.0% OOD accuracy on ImageNet CLIP fine-tuning

GRACE

Novel technique introduced


Fine-tuning approaches for Vision-Language Models (VLMs) face a critical three-way trade-off between In-Distribution (ID) accuracy, Out-of-Distribution (OOD) generalization, and adversarial robustness. Existing robust fine-tuning strategies resolve at most two axes of this trade-off. Generalization-preserving methods retain ID/OOD performance but leave models vulnerable to adversarial attacks, while adversarial training improves robustness to targeted attacks but degrades ID/OOD accuracy. Our key insight is that the robustness trade-off stems from two geometric failures: sharp, anisotropic minima in parameter space and unstable feature representations that deform under perturbation. To address this, we propose GRACE (Gram-aligned Robustness via Adaptive Curvature Estimation), a unified fine-tuning framework that jointly regularizes the parameter-space curvature and feature-space invariance for VLMs. Grounded in Robust PAC-Bayes theory, GRACE employs adaptive weight perturbations scaled by local curvature to promote flatter minima, combined with a feature alignment loss that maintains representation consistency across clean, adversarial, and OOD inputs. On ImageNet fine-tuning of CLIP models, GRACE simultaneously improves ID accuracy by 10.8%, and adversarial accuracy by 13.5% while maintaining 57.0% OOD accuracy (vs. 57.4% zero-shot baseline). Geometric analysis confirms that GRACE converges to flatter minima without feature distortion across distribution shifts, providing a principled step toward generalized robustness in foundation VLMs.


Key Contributions

  • GRACE framework that jointly regularizes parameter-space curvature and feature-space invariance for robust VLM fine-tuning
  • Theoretical grounding in Robust PAC-Bayes theory connecting geometric properties to robustness
  • Simultaneous improvement across all three axes: ID accuracy (+10.8%), adversarial accuracy (+13.5%), and maintained OOD generalization

🛡️ Threat Analysis

Input Manipulation Attack

Paper directly addresses adversarial robustness of VLMs against adversarial attacks at inference time. The proposed GRACE method defends against adversarial examples by regularizing parameter-space curvature and feature-space invariance, achieving 13.5% improvement in adversarial accuracy. The paper explicitly discusses the vulnerability to adversarial attacks and proposes a defense strategy evaluated on adversarial inputs.


Details

Domains
visionmultimodal
Model Types
vlmtransformermultimodal
Threat Tags
inference_timedigital
Datasets
ImageNet
Applications
image classificationvision-language model fine-tuning