defense 2025

FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models

Jia Deng 1, Jin Li 1, Zhenhua Zhao 2, Shaowei Wang 1

2 citations · 46 references · arXiv

α

Published on arXiv

2510.20856

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

FPT-Noise boosts CLIP's average robust accuracy from 0.07% to 56.86% under AutoAttack across 16 datasets while incurring only a -1.1% drop in clean accuracy.

FPT-Noise

Novel technique introduced


Vision-Language Models (VLMs), such as CLIP, have demonstrated remarkable zero-shot generalizability across diverse downstream tasks. However, recent studies have revealed that VLMs, including CLIP, are highly vulnerable to adversarial attacks, particularly on their visual modality. Traditional methods for improving adversarial robustness, such as adversarial training, involve extensive retraining and can be computationally expensive. In this paper, we propose a new Test-Time defense: Feature Perception Threshold Counterattack Noise (FPT-Noise), which enhances the adversarial robustness of CLIP without costly fine-tuning. Our core contributions are threefold: First, we introduce a Dynamic Feature Modulator that dynamically generate an image-specific and attack-adaptive noise intensity parameter. Second, We reanalyzed the image features of CLIP. When images are exposed to different levels of noise, clean images and adversarial images exhibit distinct rates of feature change. We established a feature perception threshold to distinguish clean images from attacked ones. Finally, we integrate a Scene-Aware Regulation guided by a stability threshold and leverage Test-Time Transformation Ensembling (TTE) to further mitigate the impact of residual noise and enhance robustness.Extensive experimentation has demonstrated that FPT-Noise significantly outperforms existing Test-Time defense methods, boosting average robust accuracy from 0.07% to 56.86% under AutoAttack while maintaining high performance on clean images (-1.1%). The code will be made public following the publication of the study. The code will be made public following the publication of the study.


Key Contributions

  • Dynamic Feature Modulator that generates image-specific and attack-adaptive counterattack noise intensity parameters without manual tuning
  • Feature Perception Threshold mechanism that distinguishes clean from adversarial images based on differential rates of feature change under noise, preventing unnecessary processing of clean inputs
  • Scene-Aware Regulation with stability threshold and Test-Time Transformation Ensembling (TTE) to further mitigate residual noise and preserve zero-shot generalization

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial input perturbations (AutoAttack, PGD) on the visual modality of CLIP at inference time — the paper's entire contribution is a defense mechanism against gradient-based adversarial examples causing misclassification.


Details

Domains
visionmultimodal
Model Types
vlmtransformer
Threat Tags
white_boxinference_timeuntargeteddigital
Datasets
CIFAR-10MNISTImageNet
Applications
image classificationzero-shot classificationvision-language model robustness