α

Published on arXiv

2604.19724

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

ViTs trained adversarially achieve nearly zero robust training loss and robust generalization error when signal-to-noise ratio satisfies certain conditions and perturbation budget is moderate


Despite the remarkable success of Vision Transformers (ViTs) across a wide range of vision tasks, recent studies have revealed that they remain vulnerable to adversarial examples, much like Convolutional Neural Networks (CNNs). A common empirical defense strategy is adversarial training, yet the theoretical underpinnings of its robustness in ViTs remain largely unexplored. In this work, we present the first theoretical analysis of adversarial training under simplified ViT architectures. We show that, when trained under a signal-to-noise ratio that satisfies a certain condition and within a moderate perturbation budget, adversarial training enables ViTs to achieve nearly zero robust training loss and robust generalization error under certain regimes. Remarkably, this leads to strong generalization even in the presence of overfitting, a phenomenon known as \emph{benign overfitting}, previously only observed in CNNs (with adversarial training). Experiments on both synthetic and real-world datasets further validate our theoretical findings.


Key Contributions

  • First theoretical analysis of adversarial training for Vision Transformers showing benign overfitting phenomenon
  • Establishes conditions under which ViTs achieve near-zero robust training loss and robust generalization error
  • Proves benign overfitting can occur in adversarial settings for ViTs under moderate perturbation budgets and sufficient signal-to-noise ratios

🛡️ Threat Analysis

Input Manipulation Attack

Paper analyzes adversarial training as a defense against adversarial examples (input manipulation attacks) on Vision Transformers. The core contribution is understanding robustness properties under adversarial perturbations with bounded budgets, which is fundamentally about defending against ML01 attacks.


Details

Domains
vision
Model Types
transformer
Threat Tags
inference_timedigital
Datasets
MNISTsynthetic datasets
Applications
image classification