attack 2026

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Devang Kulshreshtha 1, Hang Su 1, Chinmay Hegde 2, Haohan Wang 3

0 citations · 26 references · arXiv

α

Published on arXiv

2601.02670

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 97-100% attack success rate across GPT, Claude, and Llama models using an average of only ~6.4 queries — 3x more query-efficient than prior multi-turn jailbreak methods

LATS (Lexical Anchor Tree Search)

Novel technique introduced


Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query budgets. These resource limitations make jailbreaking expensive, and the queries generated by attacker LLMs often consist of non-interpretable random prefixes. This paper introduces Lexical Anchor Tree Search (), addressing these limitations through an attacker-LLM-free method that operates purely via lexical anchor injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations on AdvBench and HarmBench demonstrate that LATS achieves 97-100% ASR on latest GPT, Claude, and Llama models with an average of only ~6.4 queries, compared to 20+ queries required by other methods. These results highlight conversational structure as a potent and under-protected attack surface, while demonstrating superior query efficiency in an era where high ASR is readily achievable. Our code will be released to support reproducibility.


Key Contributions

  • LATS: a BFS-based multi-turn jailbreak algorithm requiring no attacker LLM, using only lexical anchor word injection into benign prompts to elicit harmful outputs
  • Achieves 97-100% ASR on GPT, Claude, and Llama models with ~6.4 queries on average — outperforming 6 SOTA attacks by 10%+ ASR at a fraction of the query cost
  • Demonstrates robustness against strong alignment defenses including PromptGuard, In-Context Demonstrations, and Goal Prioritization

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
AdvBenchHarmBench
Applications
chatbot safetyllm alignmentred-teaming