attack 2025

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Haidong Kang 1, Wei Wu 2, Hanling Wang 3

0 citations · 24 references · arXiv

α

Published on arXiv

2512.03882

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

ACraft significantly degrades state-of-the-art FSCIL model performance beyond human expert-designed attacks (PGD, FGSM) while maintaining lower attack costs

ACraft

Novel technique introduced


Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on base classes with only a few training examples. Previous efforts have primarily centered around studying more effective FSCIL approaches. By contrast, less attention was devoted to thinking the security issues in contributing to FSCIL. This paper aims to provide a holistic study of the impact of attacks on FSCIL. We first derive insights by systematically exploring how human expert-designed attack methods (i.e., PGD, FGSM) affect FSCIL. We find that those methods either fail to attack base classes, or suffer from huge labor costs due to relying on huge expert knowledge. This highlights the need to craft a specialized attack method for FSCIL. Grounded in these insights, in this paper, we propose a simple yet effective ACraft method to automatically steer and discover optimal attack methods targeted at FSCIL by leveraging Large Language Models (LLMs) without human experts. Moreover, to improve the reasoning between LLMs and FSCIL, we introduce a novel Proximal Policy Optimization (PPO) based reinforcement learning to optimize learning, making LLMs generate better attack methods in the next generation by establishing positive feedback. Experiments on mainstream benchmarks show that our ACraft significantly degrades the performance of state-of-the-art FSCIL methods and dramatically beyond human expert-designed attack methods while maintaining the lowest costs of attack.


Key Contributions

  • Systematic empirical analysis revealing that standard adversarial attacks (PGD, FGSM) fail to adequately attack FSCIL systems either due to ineffectiveness on base classes or prohibitive expert labor costs
  • ACraft framework that leverages LLMs to automatically generate optimal attack methods for FSCIL without requiring human expert knowledge
  • PPO-based reinforcement learning loop that establishes positive feedback between LLM-generated attack candidates and FSCIL model performance, iteratively improving attack quality across generations

🛡️ Threat Analysis

Input Manipulation Attack

ACraft automatically discovers and crafts adversarial attack methods (building on gradient-based attacks like PGD and FGSM) that cause misclassification in FSCIL models at inference time — a direct input manipulation attack contribution.


Details

Domains
visionnlp
Model Types
cnntransformerllmrl
Threat Tags
white_boxinference_timetargeted
Datasets
miniImageNetCIFAR-100CUB-200
Applications
few-shot class-incremental learningimage classification