attack 2025

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Yukun Jiang , Mingjie Li , Michael Backes , Yang Zhang

9 citations · 1 influential · 54 references · arXiv

α

Published on arXiv

2510.21189

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

JAIL-CON achieves ASR of 0.95 across 6 LLMs without guardrail and 0.64 with guardrail, more than doubling the next-best guardrail-bypassing attack (0.27).

JAIL-CON

Novel technique introduced


Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful content, a risk that has been further amplified by various jailbreak attacks. Existing jailbreak attacks mainly follow sequential logic, where LLMs understand and answer each given task one by one. However, concurrency, a natural extension of the sequential scenario, has been largely overlooked. In this work, we first propose a word-level method to enable task concurrency in LLMs, where adjacent words encode divergent intents. Although LLMs maintain strong utility in answering concurrent tasks, which is demonstrated by our evaluations on mathematical and general question-answering benchmarks, we notably observe that combining a harmful task with a benign one significantly reduces the probability of it being filtered by the guardrail, showing the potential risks associated with concurrency in LLMs. Based on these findings, we introduce $\texttt{JAIL-CON}$, an iterative attack framework that $\underline{\text{JAIL}}$breaks LLMs via task $\underline{\text{CON}}$currency. Experiments on widely-used LLMs demonstrate the strong jailbreak capabilities of $\texttt{JAIL-CON}$ compared to existing attacks. Furthermore, when the guardrail is applied as a defense, compared to the sequential answers generated by previous attacks, the concurrent answers in our $\texttt{JAIL-CON}$ exhibit greater stealthiness and are less detectable by the guardrail, highlighting the unique feature of task concurrency in jailbreaking LLMs.


Key Contributions

  • Proposes word-level task concurrency for LLMs — interleaving adjacent words from two tasks to encode divergent intents in a single prompt
  • Introduces JAIL-CON, an iterative jailbreak framework (task combination, concurrent execution, shadow judge) achieving average ASR of 0.95 without guardrail and 0.64 with guardrail — far exceeding the next-best attack at 0.27
  • Demonstrates that concurrent outputs are significantly more stealthy than sequential jailbreak outputs, evading guardrail detection at a much higher rate

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
JailbreakBenchGSM8KTruthfulQA
Applications
llm safety guardrailschat llmscontent filtering