attack 2026

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

0 citations · 38 references · arXiv

Published on arXiv

2601.20903

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves a state-of-the-art average Attack Success Rate of 97.1% across eight commercial and open-source LLMs, outperforming prior multi-turn jailbreak baselines.

ICON (Intent-Context Coupling)

Novel technique introduced

Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively constructing adversarial contexts from scratch and incrementally refining prompts. However, existing methods suffer from the inefficiency of incremental context construction that requires step-by-step LLM interaction, and often stagnate in suboptimal regions due to surface-level optimization. In this paper, we characterize the Intent-Context Coupling phenomenon, revealing that LLM safety constraints are significantly relaxed when a malicious intent is coupled with a semantically congruent context pattern. Driven by this insight, we propose ICON, an automated multi-turn jailbreak framework that efficiently constructs an authoritative-style context via prior-guided semantic routing. Specifically, ICON first routes the malicious intent to a congruent context pattern (e.g., Scientific Research) and instantiates it into an attack prompt sequence. This sequence progressively builds the authoritative-style context and ultimately elicits prohibited content. In addition, ICON incorporates a Hierarchical Optimization Strategy that combines local prompt refinement with global context switching, preventing the attack from stagnating in ineffective contexts. Experimental results across eight SOTA LLMs demonstrate the effectiveness of ICON, achieving a state-of-the-art average Attack Success Rate (ASR) of 97.1\%. Code is available at https://github.com/xwlin-roy/ICON.

Key Contributions

Characterizes the Intent-Context Coupling phenomenon: LLM safety constraints relax significantly when malicious intent is paired with a semantically congruent authoritative context pattern
Proposes ICON, a multi-turn jailbreak framework using prior-guided semantic routing to directly construct adversarial context sequences without iterative from-scratch construction
Introduces a Hierarchical Optimization Strategy combining local prompt refinement and global context switching to escape suboptimal attack regions

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

llm safety bypassjailbreaking aligned language models

Read PDF arXiv DOI Code

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding

DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search