attack 2025

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

0 citations · 41 references · arXiv

Published on arXiv

2509.26473

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves up to 93.06% attack success rate on Gemini-2.0-Flash in a multi-turn jailbreak setting, surpassing the strongest prior baseline FlipAttack.

STaR-Attack

Novel technique introduced

Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a single pass, which we call Cross-Modal Generative Injection (CMGI). Current attack methods on malicious instructions are often limited to a single modality while also relying on prompt rewriting with semantic drift, leaving the unique vulnerabilities of UMMs unexplored. We propose STaR-Attack, the first multi-turn jailbreak attack framework that exploits unique safety weaknesses of UMMs without semantic drift. Specifically, our method defines a malicious event that is strongly correlated with the target query within a spatio-temporal context. Using the three-act narrative theory, STaR-Attack generates the pre-event and the post-event scenes while concealing the malicious event as the hidden climax. When executing the attack strategy, the opening two rounds exploit the UMM's generative ability to produce images for these scenes. Subsequently, an image-based question guessing and answering game is introduced by exploiting the understanding capability. STaR-Attack embeds the original malicious question among benign candidates, forcing the model to select and answer the most relevant one given the narrative context. Extensive experiments show that STaR-Attack consistently surpasses prior approaches, achieving up to 93.06% ASR on Gemini-2.0-Flash and surpasses the strongest prior baseline, FlipAttack. Our work uncovers a critical yet underdeveloped vulnerability and highlights the need for safety alignments in UMMs.

Key Contributions

Identifies the Cross-Modal Generative Injection (CMGI) vulnerability unique to UMMs, where the model's own generative function is weaponized to craft inputs consumed by its understanding function in a single pass.
Proposes STaR-Attack, the first multi-turn jailbreak framework for UMMs using three-act narrative structure to conceal malicious intent as a 'hidden climax' between benign pre/post-event scenes, avoiding semantic drift.
Introduces a dynamic difficulty mechanism that adjusts candidate set size in the question-guessing game based on model performance, improving attack success rate and stability.

🛡️ Threat Analysis

Details

Domains

multimodalnlpvision

Model Types

vlmmultimodalllm

Threat Tags

black_boxinference_timetargeted

Datasets

AdvBench

Applications

unified multimodal modelsvision-language modelsmultimodal chatbots

Read PDF arXiv DOI

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Jailbreaking Large Vision Language Models in Intelligent Transportation Systems