defense 2025

NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations

Junjie Nan 1,2, Jianing Li 1,2, Wei Chen 1,2, Mingkun Zhang 1,2, Xueqi Cheng 1,2

0 citations · 48 references · arXiv

α

Published on arXiv

2510.14025

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

NAPPure achieves 70.8% robust accuracy on GTSRB against non-additive perturbations, outperforming DiffPure (43.2%) and adversarial training (33.8%) by large margins.

NAPPure

Novel technique introduced


Adversarial purification has achieved great success in combating adversarial image perturbations, which are usually assumed to be additive. However, non-additive adversarial perturbations such as blur, occlusion, and distortion are also common in the real world. Under such perturbations, existing adversarial purification methods are much less effective since they are designed to fit the additive nature. In this paper, we propose an extended adversarial purification framework named NAPPure, which can further handle non-additive perturbations. Specifically, we first establish the generation process of an adversarial image, and then disentangle the underlying clean image and perturbation parameters through likelihood maximization. Experiments on GTSRB and CIFAR-10 datasets show that NAPPure significantly boosts the robustness of image classification models against non-additive perturbations.


Key Contributions

  • NAPPure framework that models image generation as a transformation from clean image + perturbation parameters, then disentangles them via likelihood maximization with a pretrained diffusion model
  • Plug-and-play implementations for three non-additive perturbation types: convolution-based blur, patch-based occlusion, and flow-field-based distortion
  • Demonstrated 70.8% robust accuracy on GTSRB against non-additive perturbations, vs 43.2% for DiffPure and 33.8% for adversarial training

🛡️ Threat Analysis

Input Manipulation Attack

Proposes a defense (adversarial purification framework) against adversarial input manipulation attacks — specifically non-additive perturbations (blur, occlusion, distortion) that cause misclassification at inference time.


Details

Domains
vision
Model Types
cnndiffusion
Threat Tags
inference_timedigital
Datasets
GTSRBCIFAR-10
Applications
image classificationtraffic sign recognition