Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks
Kristina Korotkova 1, Aleksandr Katrutsa 2
Published on arXiv
2512.10936
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Frank-Wolfe projection-free methods outperform projection-based baselines (PGD, FGSM) under l1 constraints by exploiting sparse solution structure, while offering competitive performance under l2 and l-inf norms across logistic regression, CNN, and ViT architectures.
Frank-Wolfe adversarial attack
Novel technique introduced
The construction of adversarial attacks for neural networks appears to be a crucial challenge for their deployment in various services. To estimate the adversarial robustness of a neural network, a fast and efficient approach is needed to construct adversarial attacks. Since the formalization of adversarial attack construction involves solving a specific optimization problem, we consider the problem of constructing an efficient and effective adversarial attack from a numerical optimization perspective. Specifically, we suggest utilizing advanced projection-free methods, known as modified Frank-Wolfe methods, to construct white-box adversarial attacks on the given input data. We perform a theoretical and numerical evaluation of these methods and compare them with standard approaches based on projection operations or geometrical intuition. Numerical experiments are performed on the MNIST and CIFAR-10 datasets, utilizing a multiclass logistic regression model, the convolutional neural networks (CNNs), and the Vision Transformer (ViT).
Key Contributions
- Systematic empirical evaluation of advanced Frank-Wolfe projection-free variants for adversarial example generation under l1, l2, and l-inf constraints
- Theoretical and numerical comparison of projection-free methods against projection-based baselines (FGSM, PGD) across norm types
- Analysis of sparsity properties of resulting adversarial perturbations and practical recommendations per norm/model class
🛡️ Threat Analysis
Paper focuses on constructing white-box adversarial perturbations at inference time using gradient-based Frank-Wolfe optimization methods, directly attacking image classifiers by maximizing cross-entropy loss under norm-ball constraints — a canonical input manipulation attack.