Robustness Feature Adapter for Efficient Adversarial Training

Adversarial training (AT) with projected gradient descent is the most popular method to improve model robustness under adversarial attacks. However, computational overheads become prohibitively large when AT is applied to large backbone models. AT is also known to have the issue of robust overfitting. This paper contributes to solving both problems simultaneously towards building more trustworthy foundation models. In particular, we propose a new adapter-based approach for efficient AT directly in the feature space. We show that the proposed adapter-based approach can improve the inner-loop convergence quality by eliminating robust overfitting. As a result, it significantly increases computational efficiency and improves model accuracy by generalizing adversarial robustness to unseen attacks. We demonstrate the effectiveness of the new adapter-based approach in different backbone architectures and in AT at scale.

Key Contributions

Robustness Feature Adapter (RFA) module that performs adversarial perturbation directly in feature space rather than input space, reducing computational overhead of adversarial training
Demonstrates that feature-space perturbation eliminates robust overfitting by improving inner-loop convergence quality in PGD-based AT
Plug-in RFA design compatible with multiple backbone architectures (CNN, ViT) and usable for adversarial detection at inference time

🛡️ Threat Analysis

Input Manipulation Attack

Directly addresses adversarial robustness via a new adapter-based adversarial training defense. The RFA module generates perturbations in feature space to improve PGD-based adversarial training, defending against adversarial input manipulation attacks at inference time and generalizing to unseen attacks.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_timedigital

Applications

2025 0 cit.

Input Manipulation Attack

100%