attack 2025

T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models

Changzhen Li 1,2, Yuecong Min 2,3, Jie Zhang 2,3, Zheng Yuan 2,3, Shiguang Shan 1,2, Xilin Chen 2,3

0 citations · 77 references · arXiv

α

Published on arXiv

2512.23953

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Single-word prompt modifications (substitution or insertion) cause substantial degradation in semantic fidelity and temporal dynamics across state-of-the-art T2V models including CogVideoX and HunyuanVideo.

T2VAttack

Novel technique introduced


The rapid evolution of Text-to-Video (T2V) diffusion models has driven remarkable advancements in generating high-quality, temporally coherent videos from natural language descriptions. Despite these achievements, their vulnerability to adversarial attacks remains largely unexplored. In this paper, we introduce T2VAttack, a comprehensive study of adversarial attacks on T2V diffusion models from both semantic and temporal perspectives. Considering the inherently dynamic nature of video data, we propose two distinct attack objectives: a semantic objective to evaluate video-text alignment and a temporal objective to assess the temporal dynamics. To achieve an effective and efficient attack process, we propose two adversarial attack methods: (i) T2VAttack-S, which identifies semantically or temporally critical words in prompts and replaces them with synonyms via greedy search, and (ii) T2VAttack-I, which iteratively inserts optimized words with minimal perturbation to the prompt. By combining these objectives and strategies, we conduct a comprehensive evaluation on the adversarial robustness of several state-of-the-art T2V models, including ModelScope, CogVideoX, Open-Sora, and HunyuanVideo. Our experiments reveal that even minor prompt modifications, such as the substitution or insertion of a single word, can cause substantial degradation in semantic fidelity and temporal dynamics, highlighting critical vulnerabilities in current T2V diffusion models.


Key Contributions

  • Semantic and temporal attack objectives that quantify adversarial degradation in T2V models using video-text alignment and optical flow metrics
  • T2VAttack-S: greedy synonym-substitution attack identifying semantically/temporally critical words in prompts
  • T2VAttack-I: iterative word-insertion attack that optimizes minimal prompt perturbations to maximize degradation

🛡️ Threat Analysis

Input Manipulation Attack

Proposes adversarial input manipulation attacks (T2VAttack-S via synonym substitution, T2VAttack-I via word insertion) that craft perturbed text prompts causing T2V diffusion models to produce degraded outputs at inference time — an evasion/input manipulation attack on generative models, not an LLM jailbreak.


Details

Domains
multimodalgenerative
Model Types
diffusion
Threat Tags
black_boxinference_timetargeteddigital
Datasets
ModelScopeCogVideoXOpen-SoraHunyuanVideo
Applications
text-to-video generation