defense 2026

PDA: Text-Augmented Defense Framework for Robust Vision-Language Models against Adversarial Image Attacks

Jingning Xu , Haochen Luo , Chen Liu

0 citations

α

Published on arXiv

2604.01010

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves consistent robustness gains against various adversarial perturbations while maintaining competitive clean accuracy across multiple VLM architectures

PDA (Paraphrase-Decomposition-Aggregation)

Novel technique introduced


Vision-language models (VLMs) are vulnerable to adversarial image perturbations. Existing works based on adversarial training against task-specific adversarial examples are computationally expensive and often fail to generalize to unseen attack types. To address these limitations, we introduce Paraphrase-Decomposition-Aggregation (PDA), a training-free defense framework that leverages text augmentation to enhance VLM robustness under diverse adversarial image attacks. PDA performs prompt paraphrasing, question decomposition, and consistency aggregation entirely at test time, thus requiring no modification on the underlying models. To balance robustness and efficiency, we instantiate PDA as invariants that reduce the inference cost while retaining most of its robustness gains. Experiments on multiple VLM architectures and benchmarks for visual question answering, classification, and captioning show that PDA achieves consistent robustness gains against various adversarial perturbations while maintaining competitive clean accuracy, establishing a generic, strong and practical defense framework for VLMs during inference.


Key Contributions

  • Training-free defense framework using prompt paraphrasing, question decomposition, and consistency aggregation
  • Achieves robustness against diverse adversarial perturbations without model modification
  • Efficient invariant instantiations that reduce inference cost while maintaining robustness gains

🛡️ Threat Analysis

Input Manipulation Attack

Defends against adversarial image perturbations that cause VLM misclassification/incorrect outputs at inference time — classic adversarial example defense.


Details

Domains
multimodalvisionnlp
Model Types
vlmmultimodaltransformer
Threat Tags
inference_timedigital
Applications
visual question answeringimage classificationimage captioning