Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning
Published on arXiv
2508.08920
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Stage-transferred adversarial attacks achieve attack success rates comparable to direct white-box attacks, and existing adversarial training defenses for continual learning fail to adequately defend against them.
CSAT (Cross-Stage Adversarial Transferability)
Novel technique introduced
Class-incremental continual learning addresses catastrophic forgetting by enabling classification models to preserve knowledge of previously learned classes while acquiring new ones. However, the vulnerability of the models against adversarial attacks during this process has not been investigated sufficiently. In this paper, we present the first exploration of vulnerability to stage-transferred attacks, i.e., an adversarial example generated using the model in an earlier stage is used to attack the model in a later stage. Our findings reveal that continual learning methods are highly susceptible to these attacks, raising a serious security issue. We explain this phenomenon through model similarity between stages and gradual robustness degradation. Additionally, we find that existing adversarial training-based defense methods are not sufficiently effective to stage-transferred attacks. Codes are available at https://github.com/mcml-official/CSAT.
Key Contributions
- First empirical study of stage-transferred adversarial attacks in class-incremental continual learning, showing earlier-stage models serve as effective surrogates for attacking later-stage models
- Mechanistic analysis explaining cross-stage transferability through model similarity between training stages and gradual robustness degradation across stages
- Evaluation showing that existing adversarial training-based defenses for continual learning are insufficient to mitigate stage-transferred attacks
🛡️ Threat Analysis
Paper focuses on adversarial examples (gradient-based attacks: FGSM, PGD) crafted at inference time on earlier-stage models and transferred to attack later-stage classification models — a novel evasion attack scenario using the earlier stage as a surrogate.