attack 2025

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Songping Wang 1, Rufan Qian 1, Yueming Lyu 1, Qinglong Liu 1, Linzhuang Zou 1, Jie Qin 2, Songhua Liu 3, Caifeng Shan 1

2 citations · 33 references · arXiv

α

Published on arXiv

2512.06674

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves state-of-the-art jailbreak success rates on commercial I2V models (Open-Sora 2.0, CogVideoX), outperforming existing methods by 58.5–79% on COCO2017

RunawayEvil

Novel technique introduced


Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, particularly their vulnerability to jailbreak attacks, remains critically underexplored. To bridge this gap, we propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability. Built on a "Strategy-Tactic-Action" paradigm, our framework exhibits self-amplifying attack through three core components: (1) Strategy-Aware Command Unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and LLM-based strategy exploration; (2) Multimodal Tactical Planning Unit that generates coordinated text jailbreak instructions and image tampering guidelines based on the selected strategies; (3) Tactical Action Unit that executes and evaluates the multimodal coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. Specifically, RunawayEvil outperforms existing methods by 58.5 to 79 percent on COCO2017. This work provides a critical tool for vulnerability analysis of I2V models, thereby laying a foundation for more robust video generation systems.


Key Contributions

  • RunawayEvil: first multimodal jailbreak framework targeting I2V generative models, structured around a Strategy-Tactic-Action paradigm
  • Self-evolving attack strategy via reinforcement learning-driven strategy customization and LLM-based strategy exploration, requiring no human intervention
  • Coordinated multimodal attack unit generating aligned text jailbreak instructions and image tampering guidelines that collectively bypass I2V safety mechanisms

🛡️ Threat Analysis


Details

Domains
multimodalgenerativevisionnlp
Model Types
diffusionvlmllm
Threat Tags
black_boxinference_timetargeted
Datasets
COCO2017
Applications
image-to-video generationvideo generation