Improving Sustainability of Adversarial Examples in Class-Incremental Learning

Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a visual-language model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.

Key Contributions

First investigation of adversarial example sustainability under Class-Incremental Learning, identifying domain drift as the failure mechanism
Semantic Correction Module using a visual-language model as a universal semantic anchor to prevent AE overfitting during CIL updates
Filtering-and-Augmentation Module that removes semantically confusing training examples and augments the remainder for stable, generalizable AE semantics

🛡️ Threat Analysis

Input Manipulation Attack

SAE proposes a novel method for creating adversarial examples (gradient-based, targeted perturbations) that remain effective at causing misclassification at inference time even after the victim model is updated via Class-Incremental Learning. The entire contribution is improving evasion attack durability — this is core input manipulation.

Details

Domains

vision

Model Types

cnnvlm

Threat Tags

white_boxinference_timetargeteddigital

Datasets

CIFAR-100

Applications

2026 0 cit.

Input Manipulation Attack

85%