Robustifying Diffusion-Denoised Smoothing Against Covariate Shift

Randomized smoothing is a well-established method for achieving certified robustness against l2-adversarial perturbations. By incorporating a denoiser before the base classifier, pretrained classifiers can be seamlessly integrated into randomized smoothing without significant performance degradation. Among existing methods, Diffusion Denoised Smoothing - where a pretrained denoising diffusion model serves as the denoiser - has produced state-of-the-art results. However, we show that employing a denoising diffusion model introduces a covariate shift via misestimation of the added noise, ultimately degrading the smoothed classifier's performance. To address this issue, we propose a novel adversarial objective function focused on the added noise of the denoising diffusion model. This approach is inspired by our understanding of the origin of the covariate shift. Our goal is to train the base classifier to ensure it is robust against the covariate shift introduced by the denoiser. Our method significantly improves certified accuracy across three standard classification benchmarks - MNIST, CIFAR-10, and ImageNet - achieving new state-of-the-art performance in l2-adversarial perturbations. Our implementation is publicly available at https://github.com/ahedayat/Robustifying-DDS-Against-Covariate-Shift

Key Contributions

Identifies that diffusion denoised smoothing introduces a covariate shift via noise misestimation that degrades certified accuracy
Proposes a novel adversarial objective function targeting noise estimation errors to train the base classifier against the denoiser-induced covariate shift
Achieves new state-of-the-art certified accuracy under l2-adversarial perturbations on MNIST, CIFAR-10, and ImageNet

🛡️ Threat Analysis

Input Manipulation Attack

Primary contribution is a defense against l2-adversarial perturbations via improved certified robustness — specifically enhancing diffusion-denoised randomized smoothing by training the base classifier to be robust against covariate shift introduced by the denoiser.

Details

Domains

vision

Model Types

diffusioncnntransformer

Threat Tags

white_boxinference_timedigitaluntargeted

Datasets

MNISTCIFAR-10ImageNet

Applications

2026 0 cit.

Input Manipulation Attack

86%