Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful content, a risk that has been further amplified by various jailbreak attacks. Existing jailbreak attacks mainly follow sequential logic, where LLMs understand and answer each given task one by one. However, concurrency, a natural extension of the sequential scenario, has been largely overlooked. In this work, we first propose a word-level method to enable task concurrency in LLMs, where adjacent words encode divergent intents. Although LLMs maintain strong utility in answering concurrent tasks, which is demonstrated by our evaluations on mathematical and general question-answering benchmarks, we notably observe that combining a harmful task with a benign one significantly reduces the probability of it being filtered by the guardrail, showing the potential risks associated with concurrency in LLMs. Based on these findings, we introduce $\texttt{JAIL-CON}$, an iterative attack framework that $\underline{\text{JAIL}}$breaks LLMs via task $\underline{\text{CON}}$currency. Experiments on widely-used LLMs demonstrate the strong jailbreak capabilities of $\texttt{JAIL-CON}$ compared to existing attacks. Furthermore, when the guardrail is applied as a defense, compared to the sequential answers generated by previous attacks, the concurrent answers in our $\texttt{JAIL-CON}$ exhibit greater stealthiness and are less detectable by the guardrail, highlighting the unique feature of task concurrency in jailbreaking LLMs.

Key Contributions

Proposes word-level task concurrency for LLMs — interleaving adjacent words from two tasks to encode divergent intents in a single prompt
Introduces JAIL-CON, an iterative jailbreak framework (task combination, concurrent execution, shadow judge) achieving average ASR of 0.95 without guardrail and 0.64 with guardrail — far exceeding the next-best attack at 0.27
Demonstrates that concurrent outputs are significantly more stealthy than sequential jailbreak outputs, evading guardrail detection at a much higher rate

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

JailbreakBenchGSM8KTruthfulQA

Applications

llm safety guardrailschat llmscontent filtering

2025 0 cit.

100%