S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights
Gaojie Jin 1, Xinping Yi 2, Wei Huang 3, Sven Schewe 3, Xiaowei Huang 3
Published on arXiv
2603.01264
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
S²O tightens the PAC-Bayesian robust generalization bound and empirically improves adversarial robustness both as a standalone method and when combined with existing adversarial training techniques
S²O (Second-Order Statistics Optimization)
Novel technique introduced
Adversarial training has emerged as a highly effective way to improve the robustness of deep neural networks (DNNs). It is typically conceptualized as a min-max optimization problem over model weights and adversarial perturbations, where the weights are optimized using gradient descent methods, such as SGD. In this paper, we propose a novel approach by treating model weights as random variables, which paves the way for enhancing adversarial training through \textbf{S}econd-Order \textbf{S}tatistics \textbf{O}ptimization (S$^2$O) over model weights. We challenge and relax a prevalent, yet often unrealistic, assumption in prior PAC-Bayesian frameworks: the statistical independence of weights. From this relaxation, we derive an improved PAC-Bayesian robust generalization bound. Our theoretical developments suggest that optimizing the second-order statistics of weights can substantially tighten this bound. We complement this theoretical insight by conducting an extensive set of experiments that demonstrate that S$^2$O not only enhances the robustness and generalization of neural networks when used in isolation, but also seamlessly augments other state-of-the-art adversarial training techniques. The code is available at https://github.com/Alexkael/S2O.
Key Contributions
- Relaxes the statistical independence assumption in PAC-Bayesian frameworks by modeling second-order weight statistics (correlation/covariance matrices), yielding a tighter robust generalization bound
- Proposes S²O (Second-Order Statistics Optimization), a novel adversarial training paradigm that treats weights as random variables and optimizes their covariance to improve robustness
- Demonstrates empirically that S²O both stands alone as a robustness enhancer and composably augments existing state-of-the-art adversarial training methods
🛡️ Threat Analysis
Directly proposes a defense against adversarial input manipulation attacks by enhancing adversarial training — the standard min-max framework — with second-order weight statistics to tighten robust generalization bounds and improve model robustness at inference time.