Stability and Generalization of Adversarial Diffusion Training
Hesam Hosseini , Ying Cao , Ali H. Sayed
Published on arXiv
2509.19234
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Generalization error for adversarial diffusion training grows with both perturbation strength ε and number of iterations T in decentralized settings, mirroring single-agent findings and suggesting early stopping as a practical mitigation
Algorithmic stability is an established tool for analyzing generalization. While adversarial training enhances model robustness, it often suffers from robust overfitting and an enlarged generalization gap. Although recent work has established the convergence of adversarial training in decentralized networks, its generalization properties remain unexplored. This work presents a stability-based generalization analysis of adversarial training under the diffusion strategy for convex losses. We derive a bound showing that the generalization error grows with both the adversarial perturbation strength and the number of training steps, a finding consistent with single-agent case but novel for decentralized settings. Numerical experiments on logistic regression validate these theoretical predictions.
Key Contributions
- First stability-based generalization bound for adversarial training under the distributed diffusion strategy for convex losses
- Unified framework that reduces to known single-agent adversarial training and decentralized standard training bounds in their respective limits
- Empirical validation on logistic regression confirming theoretical dependence on perturbation strength ε and training steps T, with additional evidence on network topology effects
🛡️ Threat Analysis
The paper directly analyzes adversarial training (a defense against adversarial input manipulation), deriving generalization bounds that characterize the robust overfitting problem inherent to adversarial training defenses.