Efficient Adversarial Training via Criticality-Aware Fine-Tuning
Wenyun Li 1,2, Zheng Zhang 2, Dongmei Jiang 1,2, Yaowei Wang 2,3, Xiangyuan Lan 1,2
Published on arXiv
2604.12780
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves only 4.3% robustness decrease compared to full adversarial training while tuning approximately 6% of parameters
CAAT
Novel technique introduced
Vision Transformer (ViT) models have achieved remarkable performance across various vision tasks, with scalability being a key advantage when applied to large datasets. This scalability enables ViT models to exhibit strong generalization capabilities. However, as the number of parameters increases, the robustness of ViT models to adversarial examples does not scale proportionally. Adversarial training (AT), one of the most effective methods for enhancing robustness, typically requires fine-tuning the entire model, leading to prohibitively high computational costs, especially for large ViT architectures. In this paper, we aim to robustly fine-tune only a small subset of parameters to achieve robustness comparable to standard AT. To accomplish this, we introduce Criticality-Aware Adversarial Training (CAAT), a novel method that adaptively allocates resources to the most robustness-critical parameters, fine-tuning only selected modules. Specifically, CAAT efficiently identifies parameters that contribute most to adversarial robustness. It then leverages parameter-efficient fine-tuning (PEFT) to robustly adjust weight matrices where the number of critical parameters exceeds a predefined threshold. CAAT exhibits favorable generalization when scaled to larger vision transformer architectures, potentially paving the way for adversarial training at scale, e.g, compared with plain adversarial training, CAAT incurs only a 4.3% decrease in adversarial robustness while tuning approximately 6% of its parameters. Extensive experiments on three widely used adversarial learning datasets demonstrate that CAAT outperforms state-of-the-art lightweight AT methods with fewer trainable parameters.
Key Contributions
- CAAT method that identifies robustness-critical parameters and selectively fine-tunes them
- Parameter-efficient adversarial training achieving comparable robustness to full AT with ~6% trainable parameters
- Scalable adversarial training approach for large Vision Transformer architectures
🛡️ Threat Analysis
Defense against adversarial examples via adversarial training — focuses on improving robustness to input manipulation attacks at inference time.