survey 2026

Adversarial Defense in Vision-Language Models: An Overview

Xiaowei Fu , Lei Zhang

0 citations · 11 references · ICICML

α

Published on arXiv

2601.12443

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Identifies three complementary VLM defense paradigms with distinct tradeoffs: training-time methods offer strong robustness at high compute cost, test-time adaptation provides deployment flexibility, and training-free methods enable real-time defense with minimal overhead.


The widespread use of Vision Language Models (VLMs, e.g. CLIP) has raised concerns about their vulnerability to sophisticated and imperceptible adversarial attacks. These attacks could compromise model performance and system security in cross-modal tasks. To address this challenge, three main defense paradigms have been proposed: Training-time Defense, Test-time Adaptation Defense, and Training-free Defense. Training-time Defense involves modifying the training process, typically through adversarial fine-tuning to improve the robustness to adversarial examples. While effective, this approach requires substantial computational resources and may not generalize across all adversarial attacks. Test-time Adaptation Defense focuses on adapting the model at inference time by updating its parameters to handle unlabeled adversarial examples, offering flexibility but often at the cost of increased complexity and computational overhead. Training-free Defense avoids modifying the model itself, instead focusing on altering the adversarial inputs or their feature embeddings, which enforces input perturbations to mitigate the impact of attacks without additional training. This survey reviews the latest advancements in adversarial defense strategies for VLMs, highlighting the strengths and limitations of such approaches and discussing ongoing challenges in enhancing the robustness of VLMs.


Key Contributions

  • Systematic taxonomy of adversarial defense strategies for VLMs into three paradigms: Training-time Defense, Test-time Adaptation Defense, and Training-free Defense
  • Comparative experimental analysis of existing defense strategies across the three paradigms
  • Identification of open challenges and future research directions for robust VLMs

🛡️ Threat Analysis

Input Manipulation Attack

The survey's entire focus is on defending VLMs (e.g., CLIP) against adversarial input perturbations that cause misclassification in cross-modal tasks — the core ML01 threat of adversarial examples at inference time.


Details

Domains
visionnlpmultimodal
Model Types
vlmtransformer
Threat Tags
white_boxtraining_timeinference_timedigital
Applications
image classificationzero-shot cross-modal tasksautonomous drivingmedical diagnostics