attack 2026

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang

0 citations

α

Published on arXiv

2603.10091

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Thinking collapse rates reach 17% and response repetition rates up to 60%, demonstrating that multi-stream perturbation simultaneously bypasses safety mechanisms and destabilizes the reasoning process of thinking LLMs.

Multi-stream Perturbation Attack (MSPK)

Novel technique introduced


The widespread adoption of thinking mode in large language models (LLMs) has significantly enhanced complex task processing capabilities while introducing new security risks. When subjected to jailbreak attacks, the step-by-step reasoning process may cause models to generate more detailed harmful content. We observe that thinking mode exhibits unique vulnerabilities when processing interleaved multiple tasks. Based on this observation, we propose multi-stream perturbation attack, which generates superimposed interference by interweaving multiple task streams within a single prompt. We design three perturbation strategies: multi-stream interleaving, inversion perturbation, and shape transformation, which disrupt the thinking process through concurrent task interleaving, character reversal, and format constraints respectively. On JailbreakBench, AdvBench, and HarmBench datasets, our method achieves attack success rates exceeding most methods across mainstream models including Qwen3 series, DeepSeek, Qwen3-Max, and Gemini 2.5 Flash. Experiments show thinking collapse rates and response repetition rates reach up to 17% and 60% respectively, indicating multi-stream perturbation not only bypasses safety mechanisms but also causes thinking process collapse or repetitive outputs.


Key Contributions

  • Proposes multi-stream perturbation attack with three strategies (multi-stream interleaving, inversion perturbation, shape transformation) specifically targeting the thinking process of reasoning LLMs
  • Discovers dual vulnerability in thinking-mode LLMs: safety bypass AND reasoning stability failures (thinking collapse, repetitive outputs) under multi-stream perturbation
  • Achieves attack success rates exceeding most baselines on JailbreakBench, AdvBench, and HarmBench across Qwen3, DeepSeek, and Gemini 2.5 Flash

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Datasets
JailbreakBenchAdvBenchHarmBench
Applications
llm safety alignmentthinking-mode / reasoning llms