attack 2025

Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling

Deyue Zhang 1, Dongdong Yang 1, Junjie Mu 2, Quanchen Zou 1, Zonghao Ying 3, Wenzhuo Xu 1, Zhao Liu 1, Xuan Wang 1, Xiangzheng Zhang 1

1 citations · 43 references · arXiv

α

Published on arXiv

2510.15068

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 83.5% average attack success rate across 11 state-of-the-art MLLMs including GPT-5, Claude 4 Sonnet, and Gemini 2.5 Pro, surpassing prior SOTA visual jailbreak methods by 46%.

Sequential Comic Jailbreak (SCJ)

Novel technique introduced


Multimodal large language models (MLLMs) exhibit remarkable capabilities but remain susceptible to jailbreak attacks exploiting cross-modal vulnerabilities. In this work, we introduce a novel method that leverages sequential comic-style visual narratives to circumvent safety alignments in state-of-the-art MLLMs. Our method decomposes malicious queries into visually innocuous storytelling elements using an auxiliary LLM, generates corresponding image sequences through diffusion models, and exploits the models' reliance on narrative coherence to elicit harmful outputs. Extensive experiments on harmful textual queries from established safety benchmarks show that our approach achieves an average attack success rate of 83.5\%, surpassing prior state-of-the-art by 46\%. Compared with existing visual jailbreak methods, our sequential narrative strategy demonstrates superior effectiveness across diverse categories of harmful content. We further analyze attack patterns, uncover key vulnerability factors in multimodal safety mechanisms, and evaluate the limitations of current defense strategies against narrative-driven attacks, revealing significant gaps in existing protections.


Key Contributions

  • Sequential Comic Jailbreak (SCJ): first attack to exploit narrative comprehension in MLLMs by decomposing malicious queries across diffusion-generated comic panels that are individually innocuous
  • Demonstrates that safety alignment asymmetry between visual and textual modalities can be exploited via sequential storytelling, achieving 83.5% average ASR across 11 state-of-the-art MLLMs
  • Evaluates failure modes of existing defenses (Llama Guard, LLaVA Guard) against narrative-driven attacks, exposing critical gaps in multimodal safety systems

🛡️ Threat Analysis

Input Manipulation Attack

Strategically crafted visual inputs (diffusion-generated comic panels) are designed to manipulate MLLM outputs at inference time, exploiting cross-modal vulnerabilities — adversarial content manipulation of a VLM-integrated system.


Details

Domains
visionnlpmultimodalgenerative
Model Types
vlmllmdiffusion
Threat Tags
black_boxinference_timetargeteddigital
Datasets
MM-SafetyBenchHADES
Applications
multimodal large language modelssafety alignment bypass