Robustifying Diffusion-Denoised Smoothing Against Covariate Shift
Ali Hedayatnia , Mostafa Tavassolipour , Babak Nadjar Araabi , Abdol-Hossein Vahabie
Published on arXiv
2509.10913
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves new state-of-the-art certified accuracy against l2-adversarial perturbations on all three benchmarks (MNIST, CIFAR-10, ImageNet) by adversarially training the classifier against diffusion denoiser covariate shift.
Covariate-Shift-Robust Diffusion Denoised Smoothing
Novel technique introduced
Randomized smoothing is a well-established method for achieving certified robustness against l2-adversarial perturbations. By incorporating a denoiser before the base classifier, pretrained classifiers can be seamlessly integrated into randomized smoothing without significant performance degradation. Among existing methods, Diffusion Denoised Smoothing - where a pretrained denoising diffusion model serves as the denoiser - has produced state-of-the-art results. However, we show that employing a denoising diffusion model introduces a covariate shift via misestimation of the added noise, ultimately degrading the smoothed classifier's performance. To address this issue, we propose a novel adversarial objective function focused on the added noise of the denoising diffusion model. This approach is inspired by our understanding of the origin of the covariate shift. Our goal is to train the base classifier to ensure it is robust against the covariate shift introduced by the denoiser. Our method significantly improves certified accuracy across three standard classification benchmarks - MNIST, CIFAR-10, and ImageNet - achieving new state-of-the-art performance in l2-adversarial perturbations. Our implementation is publicly available at https://github.com/ahedayat/Robustifying-DDS-Against-Covariate-Shift
Key Contributions
- Identifies that diffusion denoised smoothing introduces a covariate shift via noise misestimation that degrades certified accuracy
- Proposes a novel adversarial objective function targeting noise estimation errors to train the base classifier against the denoiser-induced covariate shift
- Achieves new state-of-the-art certified accuracy under l2-adversarial perturbations on MNIST, CIFAR-10, and ImageNet
🛡️ Threat Analysis
Primary contribution is a defense against l2-adversarial perturbations via improved certified robustness — specifically enhancing diffusion-denoised randomized smoothing by training the base classifier to be robust against covariate shift introduced by the denoiser.