Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work, we propose Confidence-Aware Weighting (CAW) to enhance zero-shot robustness in vision-language models. CAW consists of two components: (1) a Confidence-Aware loss that prioritizes uncertain adversarial examples by scaling the KL divergence between clean and adversarial predictions, and (2) a feature alignment regularization that preserves semantic consistency by minimizing the distance between frozen and fine-tuned image encoder features on adversarial inputs. These components work jointly to improve both clean and robust accuracy without sacrificing generalization. Extensive experiments on TinyImageNet and 14 additional datasets show that CAW outperforms recent methods such as PMG-AFT and TGA-ZSR under strong attacks like AutoAttack, while using less memory.

Key Contributions

Confidence-Aware loss that up-weights uncertain/hard adversarial examples by scaling KL divergence between clean and adversarial prediction distributions
Feature alignment regularization that minimizes distance between frozen and fine-tuned CLIP image encoder features on adversarial inputs to preserve semantic knowledge
CAW achieves state-of-the-art zero-shot robust accuracy on TinyImageNet and 14 datasets under AutoAttack while requiring less memory than PMG-AFT and TGA-ZSR

🛡️ Threat Analysis

Input Manipulation Attack

Paper's primary contribution is a defense against adversarial image perturbations (PGD, AutoAttack, CW) that cause misclassification in CLIP at inference time — the canonical ML01 threat of input manipulation / evasion attacks on image classifiers.

Details

Domains

visionmultimodal

Model Types

vlmtransformer

Threat Tags

white_boxinference_timeuntargeteddigital

Datasets

TinyImageNetImageNet

Applications

2026 0 cit.

Input Manipulation Attack

87%