α

Published on arXiv

2508.15314

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

VideoEraser reduces undesirable concept generation by 46% on average across four erasure tasks compared to prior baselines, while maintaining video fidelity and generalizability.

VideoEraser

Novel technique introduced


The rapid growth of text-to-video (T2V) diffusion models has raised concerns about privacy, copyright, and safety due to their potential misuse in generating harmful or misleading content. These models are often trained on numerous datasets, including unauthorized personal identities, artistic creations, and harmful materials, which can lead to uncontrolled production and distribution of such content. To address this, we propose VideoEraser, a training-free framework that prevents T2V diffusion models from generating videos with undesirable concepts, even when explicitly prompted with those concepts. Designed as a plug-and-play module, VideoEraser can seamlessly integrate with representative T2V diffusion models via a two-stage process: Selective Prompt Embedding Adjustment (SPEA) and Adversarial-Resilient Noise Guidance (ARNG). We conduct extensive evaluations across four tasks, including object erasure, artistic style erasure, celebrity erasure, and explicit content erasure. Experimental results show that VideoEraser consistently outperforms prior methods regarding efficacy, integrity, fidelity, robustness, and generalizability. Notably, VideoEraser achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.


Key Contributions

  • Selective Prompt Embedding Adjustment (SPEA): training-free mechanism that steers prompt embeddings away from undesirable concept representations at inference time
  • Adversarial-Resilient Noise Guidance (ARNG): gradient-guided noise estimation that maintains concept erasure even under adversarially crafted prompts
  • Plug-and-play framework evaluated across four erasure tasks (object, style, celebrity, explicit content), reducing undesirable content by 46% on average over baselines

🛡️ Threat Analysis

Output Integrity Attack

VideoEraser's core contribution is ensuring the integrity and safety of T2V model outputs by suppressing undesirable concepts (explicit content, celebrity identities, copyrighted styles, objects) during generation — a form of output content integrity/safety control. The ARNG component adds robustness so that adversarially crafted prompts cannot circumvent the content erasure, directly addressing output integrity under adversarial conditions.


Details

Domains
generativevision
Model Types
diffusion
Threat Tags
inference_time
Applications
text-to-video generationcontent safetycopyright protectioncelebrity identity protection