Boosting Adversarial Transferability via Residual Perturbation Attack

Deep neural networks are susceptible to adversarial examples while suffering from incorrect predictions via imperceptible perturbations. Transfer-based attacks create adversarial examples for surrogate models and transfer these examples to target models under black-box scenarios. Recent studies reveal that adversarial examples in flat loss landscapes exhibit superior transferability to alleviate overfitting on surrogate models. However, the prior arts overlook the influence of perturbation directions, resulting in limited transferability. In this paper, we propose a novel attack method, named Residual Perturbation Attack (ResPA), relying on the residual gradient as the perturbation direction to guide the adversarial examples toward the flat regions of the loss function. Specifically, ResPA conducts an exponential moving average on the input gradients to obtain the first moment as the reference gradient, which encompasses the direction of historical gradients. Instead of heavily relying on the local flatness that stems from the current gradients as the perturbation direction, ResPA further considers the residual between the current gradient and the reference gradient to capture the changes in the global perturbation direction. The experimental results demonstrate the better transferability of ResPA than the existing typical transfer-based attack methods, while the transferability can be further improved by combining ResPA with the current input transformation methods. The code is available at https://github.com/ZezeTao/ResPA.

Key Contributions

Proposes ResPA, which uses exponential moving average of input gradients (first moment) as a reference gradient to capture historical perturbation direction
Introduces a residual gradient direction (difference between current and reference gradient) to guide adversarial examples toward flat loss regions with a global rather than local perspective
Demonstrates that ResPA achieves superior transferability over existing transfer-based attacks and further improves when combined with input transformation methods

🛡️ Threat Analysis

Input Manipulation Attack

Directly proposes a gradient-based adversarial perturbation attack (ResPA) that causes misclassification at inference time via imperceptible perturbations transferred across models in a black-box setting.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_timeuntargeteddigital

Datasets

ImageNetCIFAR-10

Applications

2025 0 cit.

Input Manipulation Attack

92%