attack 2026

ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

Ningyuan He 1, Ronghong Huang 1, Qianqian Tang 2, Hongyu Wang 1, Xianghang Mi 1,3, Shanqing Guo 2

0 citations · 17 references · arXiv

α

Published on arXiv

2601.21586

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves up to 95.3% attack success rate on ICL-based LLM classifiers using zero-query black-box attacks, drastically outperforming traditional NLP attacks under the same constraints

ICL-Evader

Novel technique introduced


In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness against realistic adversarial threats remains largely unexplored. We introduce ICL-Evader, a novel black-box evasion attack framework that operates under a highly practical zero-query threat model, requiring no access to model parameters, gradients, or query-based feedback during attack generation. We design three novel attacks, Fake Claim, Template, and Needle-in-a-Haystack, that exploit inherent limitations of LLMs in processing in-context prompts. Evaluated across sentiment analysis, toxicity, and illicit promotion tasks, our attacks significantly degrade classifier performance (e.g., achieving up to 95.3% attack success rate), drastically outperforming traditional NLP attacks which prove ineffective under the same constraints. To counter these vulnerabilities, we systematically investigate defense strategies and identify a joint defense recipe that effectively mitigates all attacks with minimal utility loss (<5% accuracy degradation). Finally, we translate our defensive insights into an automated tool that proactively fortifies standard ICL prompts against adversarial evasion. This work provides a comprehensive security assessment of ICL, revealing critical vulnerabilities and offering practical solutions for building more robust systems. Our source code and evaluation datasets are publicly available at: https://github.com/ChaseSecurity/ICL-Evader .


Key Contributions

  • Three novel zero-query black-box evasion attacks (Fake Claim, Template, Needle-in-a-Haystack) exploiting LLM limitations in ICL prompt processing, achieving up to 95.3% attack success rate
  • Systematic defense investigation identifying a joint recipe that mitigates all three attacks with less than 5% accuracy degradation
  • Automated tool that proactively fortifies standard ICL prompts against adversarial evasion before deployment

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Applications
text classificationsentiment analysistoxicity detectionillicit promotion detection