attack 2026

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

0 citations

Published on arXiv

2604.11309

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves over 90% attack success rate on GPT-4o and Gemini while evading real-world alignment defenses

Salami Attack

Novel technique introduced

Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among various jailbreak techniques, multi-turn jailbreak attacks are more covert and persistent than single-turn counterparts, exposing critical vulnerabilities of LLMs. However, existing multi-turn jailbreak methods suffer from two fundamental limitations that affect the actual impact in real-world scenarios: (a) As models become more context-aware, any explicit harmful trigger is increasingly likely to be flagged and blocked; (b) Successful final-step triggers often require finely tuned, model-specific contexts, making such attacks highly context-dependent. To fill this gap, we propose \textit{Salami Slicing Risk}, which operates by chaining numerous low-risk inputs that individually evade alignment thresholds but cumulatively accumulate harmful intent to ultimately trigger high-risk behaviors, without heavy reliance on pre-designed contextual structures. Building on this risk, we develop Salami Attack, an automatic framework universally applicable to multiple model types and modalities. Rigorous experiments demonstrate its state-of-the-art performance across diverse models and modalities, achieving over 90\% Attack Success Rate on GPT-4o and Gemini, as well as robustness against real-world alignment defenses. We also proposed a defense strategy to constrain the Salami Attack by at least 44.8\% while achieving a maximum blocking rate of 64.8\% against other multi-turn jailbreak attacks. Our findings provide critical insights into the pervasive risks of multi-turn jailbreaking and offer actionable mitigation strategies to enhance LLM security.

Key Contributions

Introduces 'Salami Slicing Risk' concept: chaining low-risk inputs that individually evade detection but cumulatively trigger harmful outputs
Develops Salami Attack framework universally applicable across model types and modalities without heavy context engineering
Proposes defense strategy achieving 44.8-64.8% blocking rate against multi-turn jailbreaks

🛡️ Threat Analysis

Details

Domains

nlpmultimodal

Model Types

llmmultimodal

Threat Tags

black_boxinference_timetargeted

Applications

chatbotconversational ai

Read PDF arXiv

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models

The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

The Emotional Baby Is Truly Deadly: Does your Multimodal Large Reasoning Model Have Emotional Flattery towards Humans?

Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation

Text is All You Need for Vision-Language Model Jailbreaking

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework