attack 2025

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

Neeladri Bhuiya ^1,2, Madhav Aggarwal ¹, Diptanshu Purwar ¹

¹ A10 Networks, Inc.

² University of Massachusetts Amherst

0 citations · 49 references · arXiv

Published on arXiv

2510.17947

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

PLAGUE achieves 81.4% ASR on OpenAI o3 and 67.3% on Claude Opus 4.1 (>30% improvement over prior multi-turn baselines) using a lifelong-learning-inspired three-phase attack structure.

PLAGUE

Novel technique introduced

Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.

Key Contributions

PLAGUE: a plug-and-play three-phase (Primer, Planner, Finisher) framework for designing multi-turn LLM jailbreak attacks inspired by lifelong-learning agents
Systematic exploration of the multi-turn attack family enabling context optimization, plan initialization, and adaptive query strategies
State-of-the-art attack success rates — 81.4% on OpenAI o3 and 67.3% on Claude Opus 4.1 (>30% improvement over baselines) using comparable or fewer queries

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

StrongReject

Applications

llm safety evaluationred-teamingjailbreak resistance

Read PDF arXiv DOI

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

BreakFun: Jailbreaking LLMs via Schema Exploitation

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Semantic Representation Attack against Aligned Large Language Models

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks