defense 2025

Scaling Adversarial Training via Data Selection

Youran Ye , Dejin Wang , Ajinkya Bhandare

0 citations · 7 references · arXiv

α

Published on arXiv

2512.22069

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves robustness comparable to full PGD adversarial training while reducing adversarial computation by up to 50% through informed sample selection.

Selective Adversarial Training

Novel technique introduced


Projected Gradient Descent (PGD) is a strong and widely used first-order adversarial attack, yet its computational cost scales poorly, as all training samples undergo identical iterative inner-loop optimization despite contributing unequally to robustness. Motivated by this inefficiency, we propose \emph{Selective Adversarial Training}, which perturbs only a subset of critical samples in each minibatch. Specifically, we introduce two principled selection criteria: (1) margin-based sampling, which prioritizes samples near the decision boundary, and (2) gradient-matching sampling, which selects samples whose gradients align with the dominant batch optimization direction. Adversarial examples are generated only for the selected subset, while the remaining samples are trained cleanly using a mixed objective. Experiments on MNIST and CIFAR-10 show that the proposed methods achieve robustness comparable to, or even exceeding, full PGD adversarial training, while reducing adversarial computation by up to $50\%$, demonstrating that informed sample selection is sufficient for scalable adversarial robustness.


Key Contributions

  • Margin-based selective adversarial training that prioritizes perturbing samples near the decision boundary
  • Gradient-matching selective adversarial training that selects samples whose gradients align with the dominant batch optimization direction
  • Empirical validation on MNIST and CIFAR-10 showing up to 50% reduction in adversarial computation with robustness comparable to or exceeding full PGD adversarial training

🛡️ Threat Analysis

Input Manipulation Attack

Proposes 'Selective Adversarial Training', a defense against adversarial input manipulation (evasion attacks) by efficiently generating PGD adversarial examples only for the most critical samples during training, achieving comparable robustness to full PGD-AT.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxtraining_timedigitaluntargeted
Datasets
MNISTCIFAR-10
Applications
image classification