NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan 1,2, Jianing Li 1,2, Wei Chen 1,2, Mingkun Zhang 1,2, Xueqi Cheng 1,2
Published on arXiv
2510.14025
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
NAPPure achieves 70.8% robust accuracy on GTSRB against non-additive perturbations, outperforming DiffPure (43.2%) and adversarial training (33.8%) by large margins.
NAPPure
Novel technique introduced
Adversarial purification has achieved great success in combating adversarial image perturbations, which are usually assumed to be additive. However, non-additive adversarial perturbations such as blur, occlusion, and distortion are also common in the real world. Under such perturbations, existing adversarial purification methods are much less effective since they are designed to fit the additive nature. In this paper, we propose an extended adversarial purification framework named NAPPure, which can further handle non-additive perturbations. Specifically, we first establish the generation process of an adversarial image, and then disentangle the underlying clean image and perturbation parameters through likelihood maximization. Experiments on GTSRB and CIFAR-10 datasets show that NAPPure significantly boosts the robustness of image classification models against non-additive perturbations.
Key Contributions
- NAPPure framework that models image generation as a transformation from clean image + perturbation parameters, then disentangles them via likelihood maximization with a pretrained diffusion model
- Plug-and-play implementations for three non-additive perturbation types: convolution-based blur, patch-based occlusion, and flow-field-based distortion
- Demonstrated 70.8% robust accuracy on GTSRB against non-additive perturbations, vs 43.2% for DiffPure and 33.8% for adversarial training
🛡️ Threat Analysis
Proposes a defense (adversarial purification framework) against adversarial input manipulation attacks — specifically non-additive perturbations (blur, occlusion, distortion) that cause misclassification at inference time.