VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

The rapid growth of text-to-video (T2V) diffusion models has raised concerns about privacy, copyright, and safety due to their potential misuse in generating harmful or misleading content. These models are often trained on numerous datasets, including unauthorized personal identities, artistic creations, and harmful materials, which can lead to uncontrolled production and distribution of such content. To address this, we propose VideoEraser, a training-free framework that prevents T2V diffusion models from generating videos with undesirable concepts, even when explicitly prompted with those concepts. Designed as a plug-and-play module, VideoEraser can seamlessly integrate with representative T2V diffusion models via a two-stage process: Selective Prompt Embedding Adjustment (SPEA) and Adversarial-Resilient Noise Guidance (ARNG). We conduct extensive evaluations across four tasks, including object erasure, artistic style erasure, celebrity erasure, and explicit content erasure. Experimental results show that VideoEraser consistently outperforms prior methods regarding efficacy, integrity, fidelity, robustness, and generalizability. Notably, VideoEraser achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.

Key Contributions

Selective Prompt Embedding Adjustment (SPEA): training-free mechanism that steers prompt embeddings away from undesirable concept representations at inference time
Adversarial-Resilient Noise Guidance (ARNG): gradient-guided noise estimation that maintains concept erasure even under adversarially crafted prompts
Plug-and-play framework evaluated across four erasure tasks (object, style, celebrity, explicit content), reducing undesirable content by 46% on average over baselines

🛡️ Threat Analysis

Output Integrity Attack

VideoEraser's core contribution is ensuring the integrity and safety of T2V model outputs by suppressing undesirable concepts (explicit content, celebrity identities, copyrighted styles, objects) during generation — a form of output content integrity/safety control. The ARNG component adds robustness so that adversarially crafted prompts cannot circumvent the content erasure, directly addressing output integrity under adversarial conditions.

Details

Domains

generativevision

Model Types

diffusion

Threat Tags

inference_time

Applications

2026 0 cit.

Output Integrity Attack

100%

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

FIND: A Simple yet Effective Baseline for Diffusion-Generated Image Detection

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

ALIEN: Analytic Latent Watermarking for Controllable Generation

EIRES:Training-free AI-Generated Image Detection via Edit-Induced Reconstruction Error Shift

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

I2VWM: Robust Watermarking for Image to Video Generation

A Difference-in-Difference Approach to Detecting AI-Generated Images