attack 2025

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience

Xi Wang ¹, Songlei Jian ¹, Shasha Li ¹, Xiaopeng Li ¹, Bin Ji ¹, Jun Ma ¹, Xiaodong Liu ^1,2, Jing Wang ¹, Feilong Bao ², Jianfeng Zhang ¹, Baosheng Wang ¹, Jie Yu ¹

¹ National University of Defense Technology

² Inner Mongolia University

0 citations

Published on arXiv

2508.19292

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

JailExpert achieves an average 17% increase in attack success rate and 2.7x improvement in attack efficiency compared to state-of-the-art black-box jailbreak methods.

JailExpert

Novel technique introduced

Large language models (LLMs) generate human-aligned content under certain safety constraints. However, the current known technique ``jailbreak prompt'' can circumvent safety-aligned measures and induce LLMs to output malicious content. Research on Jailbreaking can help identify vulnerabilities in LLMs and guide the development of robust security frameworks. To circumvent the issue of attack templates becoming obsolete as models evolve, existing methods adopt iterative mutation and dynamic optimization to facilitate more automated jailbreak attacks. However, these methods face two challenges: inefficiency and repetitive optimization, as they overlook the value of past attack experiences. To better integrate past attack experiences to assist current jailbreak attempts, we propose the \textbf{JailExpert}, an automated jailbreak framework, which is the first to achieve a formal representation of experience structure, group experiences based on semantic drift, and support the dynamic updating of the experience pool. Extensive experiments demonstrate that JailExpert significantly improves both attack effectiveness and efficiency. Compared to the current state-of-the-art black-box jailbreak methods, JailExpert achieves an average increase of 17\% in attack success rate and 2.7 times improvement in attack efficiency. Our implementation is available at \href{https://github.com/xiZAIzai/JailExpert}{XiZaiZai/JailExpert}

Key Contributions

First formal representation of jailbreak experience structure (queries, attack strategies, success probabilities) enabling structured reuse of past attack knowledge
Semantic-drift-based grouping of jailbreak experiences to extract representative patterns and avoid repeated optimization across different LLMs and scenarios
Dynamic experience pool update mechanism that continuously refines attack templates, achieving 17% higher attack success rate and 2.7x efficiency improvement over SOTA black-box jailbreak methods

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

AdvBench

Applications

llm safety alignment bypassautomated jailbreaking

Read PDF arXiv Code

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Adversarial versification in portuguese as a jailbreak operator in LLMs

An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks

Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models