VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
Naen Xu 1, Jinghuai Zhang 2, Changjiang Li 3, Zhi Chen 4, Chunyi Zhou 1, Qingming Li 1, Tianyu Du 1, Shouling Ji 1
Published on arXiv
2508.15314
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
VideoEraser reduces undesirable concept generation by 46% on average across four erasure tasks compared to prior baselines, while maintaining video fidelity and generalizability.
VideoEraser
Novel technique introduced
The rapid growth of text-to-video (T2V) diffusion models has raised concerns about privacy, copyright, and safety due to their potential misuse in generating harmful or misleading content. These models are often trained on numerous datasets, including unauthorized personal identities, artistic creations, and harmful materials, which can lead to uncontrolled production and distribution of such content. To address this, we propose VideoEraser, a training-free framework that prevents T2V diffusion models from generating videos with undesirable concepts, even when explicitly prompted with those concepts. Designed as a plug-and-play module, VideoEraser can seamlessly integrate with representative T2V diffusion models via a two-stage process: Selective Prompt Embedding Adjustment (SPEA) and Adversarial-Resilient Noise Guidance (ARNG). We conduct extensive evaluations across four tasks, including object erasure, artistic style erasure, celebrity erasure, and explicit content erasure. Experimental results show that VideoEraser consistently outperforms prior methods regarding efficacy, integrity, fidelity, robustness, and generalizability. Notably, VideoEraser achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.
Key Contributions
- Selective Prompt Embedding Adjustment (SPEA): training-free mechanism that steers prompt embeddings away from undesirable concept representations at inference time
- Adversarial-Resilient Noise Guidance (ARNG): gradient-guided noise estimation that maintains concept erasure even under adversarially crafted prompts
- Plug-and-play framework evaluated across four erasure tasks (object, style, celebrity, explicit content), reducing undesirable content by 46% on average over baselines
🛡️ Threat Analysis
VideoEraser's core contribution is ensuring the integrity and safety of T2V model outputs by suppressing undesirable concepts (explicit content, celebrity identities, copyrighted styles, objects) during generation — a form of output content integrity/safety control. The ARNG component adds robustness so that adversarially crafted prompts cannot circumvent the content erasure, directly addressing output integrity under adversarial conditions.