Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP
Sen Nie 1,2, Jie Zhang 1,2, Zhuo Wang 3, Shiguang Shan 1,2, Xilin Chen 1,2
Published on arXiv
2601.19210
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
CSR outperforms state-of-the-art test-time defenses by an average of 18.1% against AutoAttack across 16 classification benchmarks on CLIP.
Contrastive Spectral Rectification (CSR)
Novel technique introduced
Vision-language models (VLMs) such as CLIP have demonstrated remarkable zero-shot generalization, yet remain highly vulnerable to adversarial examples (AEs). While test-time defenses are promising, existing methods fail to provide sufficient robustness against strong attacks and are often hampered by high inference latency and task-specific applicability. To address these limitations, we start by investigating the intrinsic properties of AEs, which reveals that AEs exhibit severe feature inconsistency under progressive frequency attenuation. We further attribute this to the model's inherent spectral bias. Leveraging this insight, we propose an efficient test-time defense named Contrastive Spectral Rectification (CSR). CSR optimizes a rectification perturbation to realign the input with the natural manifold under a spectral-guided contrastive objective, which is applied input-adaptively. Extensive experiments across 16 classification benchmarks demonstrate that CSR outperforms the SOTA by an average of 18.1% against strong AutoAttack with modest inference overhead. Furthermore, CSR exhibits broad applicability across diverse visual tasks. Code is available at https://github.com/Summu77/CSR.
Key Contributions
- Empirical insight that adversarial examples exhibit severe feature inconsistency under progressive frequency attenuation, attributed to CLIP's inherent spectral bias
- CSR: an input-adaptive test-time defense that optimizes a rectification perturbation to realign adversarial inputs to the natural manifold using a spectral-guided contrastive objective
- Demonstrates 18.1% average improvement over SOTA against AutoAttack across 16 classification benchmarks with modest inference overhead and broad task applicability
🛡️ Threat Analysis
Directly defends against adversarial examples (input manipulation attacks) targeting CLIP at inference time; evaluated against strong attacks including AutoAttack; proposes a novel input purification strategy via spectral rectification to restore natural manifold alignment.