defense 2025

ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP

Li Su , Andras Balogh

1 citations · 49 references · arXiv

α

Published on arXiv

2511.17362

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

ATAC surpasses previous state-of-the-art test-time adversarial defenses for CLIP by nearly 50% in average robust accuracy across 13 benchmarks, with minimal computational overhead and nontrivial resilience to adaptive attacks.

ATAC (Augmentation-based Test-time Adversarial Correction)

Novel technique introduced


Despite its remarkable success in zero-shot image-text matching, CLIP remains highly vulnerable to adversarial perturbations on images. As adversarial fine-tuning is prohibitively costly, recent works explore various test-time defense strategies; however, these approaches still exhibit limited robustness. In this work, we revisit this problem and propose a simple yet effective strategy: Augmentation-based Test-time Adversarial Correction (ATAC). Our method operates directly in the embedding space of CLIP, calculating augmentation-induced drift vectors to infer a semantic recovery direction and correcting the embedding based on the angular consistency of these latent drifts. Across a wide range of benchmarks, ATAC consistently achieves remarkably high robustness, surpassing that of previous state-of-the-art methods by nearly 50\% on average, all while requiring minimal computational overhead. Furthermore, ATAC retains state-of-the-art robustness in unconventional and extreme settings and even achieves nontrivial robustness against adaptive attacks. Our results demonstrate that ATAC is an efficient method in a novel paradigm for test-time adversarial defenses in the embedding space of CLIP.


Key Contributions

  • ATAC: a test-time defense that estimates a semantic recovery direction in CLIP's embedding space using augmentation-induced drift vectors to correct adversarially perturbed image embeddings
  • Empirical insight that gradient-based attacks induce consistent directional shifts in CLIP's feature space, enabling reliable recovery without access to labels
  • Achieves ~50% improvement in robust accuracy over prior SOTA across 13 classification benchmarks while retaining nontrivial robustness against adaptive attacks

🛡️ Threat Analysis

Input Manipulation Attack

Paper proposes a defense (ATAC) against adversarial image perturbations targeting CLIP at inference time. The threat model is standard adversarial examples (gradient-based attacks) on images, and the defense operates in embedding space to correct perturbed inputs — a direct ML01 defense.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
white_boxblack_boxinference_timeuntargeteddigital
Datasets
ImageNet
Applications
zero-shot image classificationimage-text matching