NADD: Amplifying Noise for Effective Diffusion-based Adversarial Purification

The strategy of combining diffusion-based generative models with classifiers continues to demonstrate state-of-the-art performance on adversarial robustness benchmarks. Known as adversarial purification, this exploits a diffusion model's capability of identifying high density regions in data distributions to purify adversarial perturbations from inputs. However, existing diffusion-based purification defenses are impractically slow and limited in robustness due to the low levels of noise used in the diffusion process. This low noise design aims to preserve the semantic features of the original input, thereby minimizing utility loss for benign inputs. Our findings indicate that systematic amplification of noise throughout the diffusion process improves the robustness of adversarial purification. However, this approach presents a key challenge, as noise levels cannot be arbitrarily increased without risking distortion of the input. To address this key problem, we introduce high levels of noise during the forward process and propose the ring proximity correction to gradually eliminate adversarial perturbations whilst closely preserving the original data sample. As a second contribution, we propose a new stochastic sampling method which introduces additional noise during the reverse diffusion process to dilute adversarial perturbations. Without relying on gradient obfuscation, these contributions result in a new robustness accuracy record of 44.23% on ImageNet using AutoAttack ($\ell_{\infty}=4/255$), an improvement of +2.07% over the previous best work. Furthermore, our method reduces inference time to 1.08 seconds per sample on ImageNet, a $47\times$ improvement over the existing state-of-the-art approach, making it far more practical for real-world defensive scenarios.

Key Contributions

Noise amplification strategy for diffusion-based purification that increases robustness without distorting inputs, via a novel 'ring proximity correction' mechanism
Stochastic sampling method that injects additional noise during the reverse diffusion process to dilute adversarial perturbations
Sets a new robustness record of 44.23% on ImageNet under AutoAttack (ℓ∞=4/255) with 47× faster inference (1.08s/sample) compared to prior state-of-the-art

🛡️ Threat Analysis

Input Manipulation Attack

Paper proposes a defense (adversarial purification) against adversarial examples — inputs crafted to cause misclassification at inference time. The method uses diffusion models to remove adversarial perturbations, evaluated against AutoAttack on ImageNet.

Details

Domains

vision

Model Types

diffusioncnntransformer

Threat Tags

white_boxinference_timedigitaluntargeted

Datasets

ImageNet

Applications

2026 0 cit.

Input Manipulation Attack

86%