Improving Sustainability of Adversarial Examples in Class-Incremental Learning
Taifeng Liu , Xinjing Liu , Liangqiu Dong , Yang Liu , Yilong Yang , Zhuo Ma
Published on arXiv
2511.09088
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
SAE improves targeted attack success rate by an average of 31.28% over SOTA baselines after a 9-fold increase in the number of CIL classes on ResNet-32/CIFAR-100.
SAE
Novel technique introduced
Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a visual-language model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.
Key Contributions
- First investigation of adversarial example sustainability under Class-Incremental Learning, identifying domain drift as the failure mechanism
- Semantic Correction Module using a visual-language model as a universal semantic anchor to prevent AE overfitting during CIL updates
- Filtering-and-Augmentation Module that removes semantically confusing training examples and augments the remainder for stable, generalizable AE semantics
🛡️ Threat Analysis
SAE proposes a novel method for creating adversarial examples (gradient-based, targeted perturbations) that remain effective at causing misclassification at inference time even after the victim model is updated via Class-Incremental Learning. The entire contribution is improving evasion attack durability — this is core input manipulation.