defense 2025

Robustifying Diffusion-Denoised Smoothing Against Covariate Shift

Ali Hedayatnia , Mostafa Tavassolipour , Babak Nadjar Araabi , Abdol-Hossein Vahabie

0 citations

α

Published on arXiv

2509.10913

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves new state-of-the-art certified accuracy against l2-adversarial perturbations on all three benchmarks (MNIST, CIFAR-10, ImageNet) by adversarially training the classifier against diffusion denoiser covariate shift.

Covariate-Shift-Robust Diffusion Denoised Smoothing

Novel technique introduced


Randomized smoothing is a well-established method for achieving certified robustness against l2-adversarial perturbations. By incorporating a denoiser before the base classifier, pretrained classifiers can be seamlessly integrated into randomized smoothing without significant performance degradation. Among existing methods, Diffusion Denoised Smoothing - where a pretrained denoising diffusion model serves as the denoiser - has produced state-of-the-art results. However, we show that employing a denoising diffusion model introduces a covariate shift via misestimation of the added noise, ultimately degrading the smoothed classifier's performance. To address this issue, we propose a novel adversarial objective function focused on the added noise of the denoising diffusion model. This approach is inspired by our understanding of the origin of the covariate shift. Our goal is to train the base classifier to ensure it is robust against the covariate shift introduced by the denoiser. Our method significantly improves certified accuracy across three standard classification benchmarks - MNIST, CIFAR-10, and ImageNet - achieving new state-of-the-art performance in l2-adversarial perturbations. Our implementation is publicly available at https://github.com/ahedayat/Robustifying-DDS-Against-Covariate-Shift


Key Contributions

  • Identifies that diffusion denoised smoothing introduces a covariate shift via noise misestimation that degrades certified accuracy
  • Proposes a novel adversarial objective function targeting noise estimation errors to train the base classifier against the denoiser-induced covariate shift
  • Achieves new state-of-the-art certified accuracy under l2-adversarial perturbations on MNIST, CIFAR-10, and ImageNet

🛡️ Threat Analysis

Input Manipulation Attack

Primary contribution is a defense against l2-adversarial perturbations via improved certified robustness — specifically enhancing diffusion-denoised randomized smoothing by training the base classifier to be robust against covariate shift introduced by the denoiser.


Details

Domains
vision
Model Types
diffusioncnntransformer
Threat Tags
white_boxinference_timedigitaluntargeted
Datasets
MNISTCIFAR-10ImageNet
Applications
image classification