defense 2025

Tight Robustness Certificates and Wasserstein Distributional Attacks for Deep Neural Networks

Bach C. Le , Tung V. Dao , Binh T. Nguyen , Hong T.M. Chu

0 citations · 66 references · arXiv

α

Published on arXiv

2510.10000

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

WDA and WDA++ consistently find stronger adversarial examples than state-of-the-art methods including AutoAttack, while the WDRO certificate framework yields tighter upper bounds than existing global Lipschitz approaches.

WDA/WDA++ (Wasserstein Distributional Attack)

Novel technique introduced


Wasserstein distributionally robust optimization (WDRO) provides a framework for adversarial robustness, yet existing methods based on global Lipschitz continuity or strong duality often yield loose upper bounds or require prohibitive computation. We address these limitations with a primal approach and adopt a notion of exact Lipschitz certificates to tighten this upper bound of WDRO. For ReLU networks, we leverage the piecewise-affine structure on activation cells to obtain an exact tractable characterization of the corresponding WDRO problem. We further extend our analysis to modern architectures with smooth activations (e.g., GELU, SiLU), such as Transformers. Additionally, we propose novel Wasserstein Distributional Attacks (WDA, WDA++) that construct candidates for the worst-case distribution. Compared to existing attacks that are restricted to point-wise perturbations, our methods offer greater flexibility in the number and location of attack points. Extensive evaluations demonstrate that our proposed framework achieves competitive robust accuracy against state-of-the-art baselines while offering tighter certificates than existing methods. Our code is available at https://github.com/OLab-Repo/WDA.


Key Contributions

  • Exact tractable WDRO upper/lower bounds for ReLU networks using piecewise-affine activation cell structure and tight local Lipschitz certificates, with a sufficient condition for bound tightness.
  • Extension of WDRO analysis to smooth-activation architectures (GELU, SiLU, Transformers) via gradient-norm-based exact Lipschitz characterization.
  • Novel Wasserstein Distributional Attacks (WDA and adaptive WDA++) that optimize over distributions supported on 2N points, strictly generalizing point-wise perturbation attacks and consistently outperforming AutoAttack-era baselines.

🛡️ Threat Analysis

Input Manipulation Attack

The paper's dual contribution — tighter certified robustness bounds via exact Lipschitz certificates (defense) and novel Wasserstein Distributional Attacks WDA/WDA++ that craft worst-case adversarial distributions at inference time (attack) — directly targets adversarial example robustness, the core of ML01.


Details

Domains
visionnlp
Model Types
cnntransformer
Threat Tags
white_boxinference_timeuntargeteddigital
Datasets
CIFAR-10CIFAR-100ImageNet
Applications
image classification