T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
Changzhen Li 1,2, Yuecong Min 2,3, Jie Zhang 2,3, Zheng Yuan 2,3, Shiguang Shan 1,2, Xilin Chen 2,3
Published on arXiv
2512.23953
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Single-word prompt modifications (substitution or insertion) cause substantial degradation in semantic fidelity and temporal dynamics across state-of-the-art T2V models including CogVideoX and HunyuanVideo.
T2VAttack
Novel technique introduced
The rapid evolution of Text-to-Video (T2V) diffusion models has driven remarkable advancements in generating high-quality, temporally coherent videos from natural language descriptions. Despite these achievements, their vulnerability to adversarial attacks remains largely unexplored. In this paper, we introduce T2VAttack, a comprehensive study of adversarial attacks on T2V diffusion models from both semantic and temporal perspectives. Considering the inherently dynamic nature of video data, we propose two distinct attack objectives: a semantic objective to evaluate video-text alignment and a temporal objective to assess the temporal dynamics. To achieve an effective and efficient attack process, we propose two adversarial attack methods: (i) T2VAttack-S, which identifies semantically or temporally critical words in prompts and replaces them with synonyms via greedy search, and (ii) T2VAttack-I, which iteratively inserts optimized words with minimal perturbation to the prompt. By combining these objectives and strategies, we conduct a comprehensive evaluation on the adversarial robustness of several state-of-the-art T2V models, including ModelScope, CogVideoX, Open-Sora, and HunyuanVideo. Our experiments reveal that even minor prompt modifications, such as the substitution or insertion of a single word, can cause substantial degradation in semantic fidelity and temporal dynamics, highlighting critical vulnerabilities in current T2V diffusion models.
Key Contributions
- Semantic and temporal attack objectives that quantify adversarial degradation in T2V models using video-text alignment and optical flow metrics
- T2VAttack-S: greedy synonym-substitution attack identifying semantically/temporally critical words in prompts
- T2VAttack-I: iterative word-insertion attack that optimizes minimal prompt perturbations to maximize degradation
🛡️ Threat Analysis
Proposes adversarial input manipulation attacks (T2VAttack-S via synonym substitution, T2VAttack-I via word insertion) that craft perturbed text prompts causing T2V diffusion models to produce degraded outputs at inference time — an evasion/input manipulation attack on generative models, not an LLM jailbreak.