attack 2025

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Ragib Amin Nihal ^1,2, Rui Wen ¹, Kazuhiro Nakadai ¹, Jun Sakuma ^1,2

¹ Institute of Science Tokyo

² RIKEN AIP

1 citations · 59 references · arXiv

Published on arXiv

2510.08859

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

PE-CoA achieves state-of-the-art multi-turn jailbreak success across twelve LLMs and ten harm categories, demonstrating that models exhibit distinct, non-generalizing vulnerability profiles to different conversational patterns.

PE-CoA (Pattern Enhanced Chain of Attack)

Novel technique introduced

Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints gradually. These attacks target different harm categories through distinct conversational approaches. Existing multi-turn methods often rely on heuristic or ad hoc exploration strategies, providing limited insight into underlying model weaknesses. The relationship between conversation patterns and model vulnerabilities across harm categories remains poorly understood. We propose Pattern Enhanced Chain of Attack (PE-CoA), a framework of five conversation patterns to construct multi-turn jailbreaks through natural dialogue. Evaluating PE-CoA on twelve LLMs spanning ten harm categories, we achieve state-of-the-art performance, uncovering pattern-specific vulnerabilities and LLM behavioral characteristics: models exhibit distinct weakness profiles, defense to one pattern does not generalize to others, and model families share similar failure modes. These findings highlight limitations of safety training and indicate the need for pattern-aware defenses. Code available on: https://github.com/Ragib-Amin-Nihal/PE-CoA

Key Contributions

PE-CoA framework defining five empirically validated conversation patterns (e.g., hypothetical, information-seeking, personal narrative) for structured multi-turn jailbreak construction
Systematic vulnerability analysis across twelve LLMs and ten harm categories, revealing model-family-level failure modes and pattern-specific weakness profiles
Finding that safety defenses to one conversational pattern do not generalize to others, exposing fundamental gaps in current alignment training

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

AdvBench

Applications

llm safety alignmentred-teamingchatbot safety

Read PDF arXiv DOI Code

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

The Echo Chamber Multi-Turn LLM Jailbreak

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software