defense 2026

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Wenyun Li 1,2, Zheng Zhang 2, Dongmei Jiang 1,2, Yaowei Wang 2,3, Xiangyuan Lan 1,2

0 citations

α

Published on arXiv

2604.12780

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves only 4.3% robustness decrease compared to full adversarial training while tuning approximately 6% of parameters

CAAT

Novel technique introduced


Vision Transformer (ViT) models have achieved remarkable performance across various vision tasks, with scalability being a key advantage when applied to large datasets. This scalability enables ViT models to exhibit strong generalization capabilities. However, as the number of parameters increases, the robustness of ViT models to adversarial examples does not scale proportionally. Adversarial training (AT), one of the most effective methods for enhancing robustness, typically requires fine-tuning the entire model, leading to prohibitively high computational costs, especially for large ViT architectures. In this paper, we aim to robustly fine-tune only a small subset of parameters to achieve robustness comparable to standard AT. To accomplish this, we introduce Criticality-Aware Adversarial Training (CAAT), a novel method that adaptively allocates resources to the most robustness-critical parameters, fine-tuning only selected modules. Specifically, CAAT efficiently identifies parameters that contribute most to adversarial robustness. It then leverages parameter-efficient fine-tuning (PEFT) to robustly adjust weight matrices where the number of critical parameters exceeds a predefined threshold. CAAT exhibits favorable generalization when scaled to larger vision transformer architectures, potentially paving the way for adversarial training at scale, e.g, compared with plain adversarial training, CAAT incurs only a 4.3% decrease in adversarial robustness while tuning approximately 6% of its parameters. Extensive experiments on three widely used adversarial learning datasets demonstrate that CAAT outperforms state-of-the-art lightweight AT methods with fewer trainable parameters.


Key Contributions

  • CAAT method that identifies robustness-critical parameters and selectively fine-tunes them
  • Parameter-efficient adversarial training achieving comparable robustness to full AT with ~6% trainable parameters
  • Scalable adversarial training approach for large Vision Transformer architectures

🛡️ Threat Analysis

Input Manipulation Attack

Defense against adversarial examples via adversarial training — focuses on improving robustness to input manipulation attacks at inference time.


Details

Domains
vision
Model Types
transformer
Threat Tags
inference_timedigital
Applications
image classification