benchmark 2025

Does simple trump complex? Comparing strategies for adversarial robustness in DNNs

William Brooks 1,2,3, Marelie H. Davel 1,2,4, Coenraad Mouton 1,2

0 citations

α

Published on arXiv

2508.18019

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Component-wise ablation on CIFAR-10 with VGG-16 reveals which elements of margin-based adversarial training (DyART vs. large-margin loss) most effectively enhance robustness under AutoAttack and PGD evaluation

Dynamics-Aware Robust Training (DyART)

Novel technique introduced


Deep Neural Networks (DNNs) have shown substantial success in various applications but remain vulnerable to adversarial attacks. This study aims to identify and isolate the components of two different adversarial training techniques that contribute most to increased adversarial robustness, particularly through the lens of margins in the input space -- the minimal distance between data points and decision boundaries. Specifically, we compare two methods that maximize margins: a simple approach which modifies the loss function to increase an approximation of the margin, and a more complex state-of-the-art method (Dynamics-Aware Robust Training) which builds upon this approach. Using a VGG-16 model as our base, we systematically isolate and evaluate individual components from these methods to determine their relative impact on adversarial robustness. We assess the effect of each component on the model's performance under various adversarial attacks, including AutoAttack and Projected Gradient Descent (PGD). Our analysis on the CIFAR-10 dataset reveals which elements most effectively enhance adversarial robustness, providing insights for designing more robust DNNs.


Key Contributions

  • Systematic component-level evaluation of DyART (Xu et al.) and large-margin loss (Elsayed et al.) adversarial training techniques to isolate which elements drive robustness
  • Identification of the most impactful components of margin-based adversarial training on CIFAR-10 using VGG-16
  • Practical insights into which aspects of margin maximization matter most for designing adversarially robust classifiers

🛡️ Threat Analysis

Input Manipulation Attack

Paper analyzes defenses against adversarial example attacks — specifically margin-based adversarial training techniques evaluated against AutoAttack and PGD, the canonical input manipulation attacks for image classifiers.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxinference_timedigital
Datasets
CIFAR-10
Applications
image classification