defense 2026

NADD: Amplifying Noise for Effective Diffusion-based Adversarial Purification

David D. Nguyen 1,2, The-Anh Ta 1, Yansong Gao 1, Alsharif Abuadbba 1

0 citations · 59 references · arXiv

α

Published on arXiv

2601.01109

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves 44.23% robust accuracy on ImageNet under AutoAttack (ℓ∞=4/255), a +2.07% improvement over the previous best, at 1.08 seconds per sample (47× faster than prior state-of-the-art).

NADD (with ring proximity correction)

Novel technique introduced


The strategy of combining diffusion-based generative models with classifiers continues to demonstrate state-of-the-art performance on adversarial robustness benchmarks. Known as adversarial purification, this exploits a diffusion model's capability of identifying high density regions in data distributions to purify adversarial perturbations from inputs. However, existing diffusion-based purification defenses are impractically slow and limited in robustness due to the low levels of noise used in the diffusion process. This low noise design aims to preserve the semantic features of the original input, thereby minimizing utility loss for benign inputs. Our findings indicate that systematic amplification of noise throughout the diffusion process improves the robustness of adversarial purification. However, this approach presents a key challenge, as noise levels cannot be arbitrarily increased without risking distortion of the input. To address this key problem, we introduce high levels of noise during the forward process and propose the ring proximity correction to gradually eliminate adversarial perturbations whilst closely preserving the original data sample. As a second contribution, we propose a new stochastic sampling method which introduces additional noise during the reverse diffusion process to dilute adversarial perturbations. Without relying on gradient obfuscation, these contributions result in a new robustness accuracy record of 44.23% on ImageNet using AutoAttack ($\ell_{\infty}=4/255$), an improvement of +2.07% over the previous best work. Furthermore, our method reduces inference time to 1.08 seconds per sample on ImageNet, a $47\times$ improvement over the existing state-of-the-art approach, making it far more practical for real-world defensive scenarios.


Key Contributions

  • Noise amplification strategy for diffusion-based purification that increases robustness without distorting inputs, via a novel 'ring proximity correction' mechanism
  • Stochastic sampling method that injects additional noise during the reverse diffusion process to dilute adversarial perturbations
  • Sets a new robustness record of 44.23% on ImageNet under AutoAttack (ℓ∞=4/255) with 47× faster inference (1.08s/sample) compared to prior state-of-the-art

🛡️ Threat Analysis

Input Manipulation Attack

Paper proposes a defense (adversarial purification) against adversarial examples — inputs crafted to cause misclassification at inference time. The method uses diffusion models to remove adversarial perturbations, evaluated against AutoAttack on ImageNet.


Details

Domains
vision
Model Types
diffusioncnntransformer
Threat Tags
white_boxinference_timedigitaluntargeted
Datasets
ImageNet
Applications
image classificationadversarial robustness benchmarks