SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal 1, Huanrui Yang 2, Mark Horton 1, Yiran Chen 1
Published on arXiv
2501.01529
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
SAFER consistently improves both clean and adversarial accuracy by ~5% on average over PGD-AT baselines, with gains up to 20% across DeiT-Ti, ViT-S, ConViT-B, and Swin-B architectures.
SAFER
Novel technique introduced
Vision transformers (ViTs) have become essential backbones in advanced computer vision applications and multi-modal foundation models. Despite their strengths, ViTs remain vulnerable to adversarial perturbations, comparable to or even exceeding the vulnerability of convolutional neural networks (CNNs). Furthermore, the large parameter count and complex architecture of ViTs make them particularly prone to adversarial overfitting, often compromising both clean and adversarial accuracy. This paper mitigates adversarial overfitting in ViTs through a novel, layer-selective fine-tuning approach: SAFER. Instead of optimizing the entire model, we identify and selectively fine-tune a small subset of layers most susceptible to overfitting, applying sharpness-aware minimization to these layers while freezing the rest of the model. Our method consistently enhances both clean and adversarial accuracy over baseline approaches. Typical improvements are around 5%, with some cases achieving gains as high as 20% across various ViT architectures and datasets.
Key Contributions
- Layer-selective fine-tuning strategy that identifies ViT layers most susceptible to adversarial overfitting using sharpness measurements, fine-tuning only ~5% of layers
- Application of Sharpness-Aware Minimization (SAM) exclusively to the sharpest layers while freezing the rest, reducing adversarial overfitting without degrading clean accuracy
- Dynamic layer reselection every 10 epochs during fine-tuning, shown via ablation to be critical for SAFER's performance gains
🛡️ Threat Analysis
Paper proposes a defense against adversarial perturbation attacks — it improves adversarial robustness of ViTs through a novel adversarial training technique (layer-selective SAM fine-tuning), evaluated directly against PGD, AutoAttack, FAB, StAdv, and PIXEL attacks at inference time.