defense 2026

Rectifying Adversarial Examples Using Their Vulnerabilities

Fumiya Morimoto , Ryuto Morita , Satoshi Ono

2 citations · 1 influential · 58 references · IEEE Access · Open Access

α

Published on arXiv

2601.00270

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

The proposed re-attack-based rectification method outperforms conventional rectification and input transformation baselines in stability across targeted and black-box adversarial attacks without requiring preliminary training or parameter adjustment.


Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanisms that mitigate their threats. Most existing methods primarily focus on discriminating AEs based on the input sample features, emphasizing AE detection without addressing the correct sample categorization before an attack. While some tasks may only require mere rejection on detected AEs, others necessitate identifying the correct original input category such as traffic sign recognition in autonomous driving. The objective of this study is to propose a method for rectifying AEs to estimate the correct labels of their original inputs. Our method is based on re-attacking AEs to move them beyond the decision boundary for accurate label prediction, effectively addressing the issue of rectifying minimally perceptible AEs created using white-box attack methods. However, challenge remains with respect to effectively rectifying AEs produced by black-box attacks at a distance from the boundary, or those misclassified into low-confidence categories by targeted attacks. By adopting a straightforward approach of only considering AEs as inputs, the proposed method can address diverse attacks while avoiding the requirement of parameter adjustments or preliminary training. Results demonstrate that the proposed method exhibits consistent performance in rectifying AEs generated via various attack methods, including targeted and black-box attacks. Moreover, it outperforms conventional rectification and input transformation methods in terms of stability against various attacks.


Key Contributions

  • Rectification method that re-attacks adversarial examples using their own fragility (proximity to decision boundaries) to recover correct pre-attack labels
  • Training-free and parameter-tuning-free approach that treats all inputs as AEs, enabling unrestricted re-attack without domain-specific preprocessing
  • Demonstrated consistent rectification performance across diverse attack types including white-box, black-box, and targeted attacks, outperforming conventional input transformation methods

🛡️ Threat Analysis

Input Manipulation Attack

Proposes a defense (rectification) against adversarial input manipulation attacks — re-attacks AEs using white-box gradient methods to push them across decision boundaries and restore correct classification, evaluated against FGSM, PGD, C&W, targeted, and black-box attacks.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxblack_boxinference_timetargeteduntargeteddigital
Applications
image classificationtraffic sign recognitionautonomous driving