T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models

The rapid evolution of Text-to-Video (T2V) diffusion models has driven remarkable advancements in generating high-quality, temporally coherent videos from natural language descriptions. Despite these achievements, their vulnerability to adversarial attacks remains largely unexplored. In this paper, we introduce T2VAttack, a comprehensive study of adversarial attacks on T2V diffusion models from both semantic and temporal perspectives. Considering the inherently dynamic nature of video data, we propose two distinct attack objectives: a semantic objective to evaluate video-text alignment and a temporal objective to assess the temporal dynamics. To achieve an effective and efficient attack process, we propose two adversarial attack methods: (i) T2VAttack-S, which identifies semantically or temporally critical words in prompts and replaces them with synonyms via greedy search, and (ii) T2VAttack-I, which iteratively inserts optimized words with minimal perturbation to the prompt. By combining these objectives and strategies, we conduct a comprehensive evaluation on the adversarial robustness of several state-of-the-art T2V models, including ModelScope, CogVideoX, Open-Sora, and HunyuanVideo. Our experiments reveal that even minor prompt modifications, such as the substitution or insertion of a single word, can cause substantial degradation in semantic fidelity and temporal dynamics, highlighting critical vulnerabilities in current T2V diffusion models.

Key Contributions

Semantic and temporal attack objectives that quantify adversarial degradation in T2V models using video-text alignment and optical flow metrics
T2VAttack-S: greedy synonym-substitution attack identifying semantically/temporally critical words in prompts
T2VAttack-I: iterative word-insertion attack that optimizes minimal prompt perturbations to maximize degradation

🛡️ Threat Analysis

Input Manipulation Attack

Proposes adversarial input manipulation attacks (T2VAttack-S via synonym substitution, T2VAttack-I via word insertion) that craft perturbed text prompts causing T2V diffusion models to produce degraded outputs at inference time — an evasion/input manipulation attack on generative models, not an LLM jailbreak.

Details

Domains

multimodalgenerative

Model Types

diffusion

Threat Tags

black_boxinference_timetargeteddigital

Datasets

ModelScopeCogVideoXOpen-SoraHunyuanVideo

Applications

2026 0 cit.

Input Manipulation Attack

64%