The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
Yihao Zhang 1, Kai Wang 1, Jiangrong Wu 2, Haolin Wu 3, Yuxuan Zhou 4, Zeming Wei 1, Dongxian Wu 5, Xun Chen 5, Jun Sun 6, Meng Sun 1
Published on arXiv
2604.11309
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Achieves over 90% attack success rate on GPT-4o and Gemini while evading real-world alignment defenses
Salami Attack
Novel technique introduced
Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among various jailbreak techniques, multi-turn jailbreak attacks are more covert and persistent than single-turn counterparts, exposing critical vulnerabilities of LLMs. However, existing multi-turn jailbreak methods suffer from two fundamental limitations that affect the actual impact in real-world scenarios: (a) As models become more context-aware, any explicit harmful trigger is increasingly likely to be flagged and blocked; (b) Successful final-step triggers often require finely tuned, model-specific contexts, making such attacks highly context-dependent. To fill this gap, we propose \textit{Salami Slicing Risk}, which operates by chaining numerous low-risk inputs that individually evade alignment thresholds but cumulatively accumulate harmful intent to ultimately trigger high-risk behaviors, without heavy reliance on pre-designed contextual structures. Building on this risk, we develop Salami Attack, an automatic framework universally applicable to multiple model types and modalities. Rigorous experiments demonstrate its state-of-the-art performance across diverse models and modalities, achieving over 90\% Attack Success Rate on GPT-4o and Gemini, as well as robustness against real-world alignment defenses. We also proposed a defense strategy to constrain the Salami Attack by at least 44.8\% while achieving a maximum blocking rate of 64.8\% against other multi-turn jailbreak attacks. Our findings provide critical insights into the pervasive risks of multi-turn jailbreaking and offer actionable mitigation strategies to enhance LLM security.
Key Contributions
- Introduces 'Salami Slicing Risk' concept: chaining low-risk inputs that individually evade detection but cumulatively trigger harmful outputs
- Develops Salami Attack framework universally applicable across model types and modalities without heavy context engineering
- Proposes defense strategy achieving 44.8-64.8% blocking rate against multi-turn jailbreaks