defense 2026

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models

Yang Yang 1, Xinze Zou 1, Zehua Ma 2, Han Fang 3, Weiming Zhang 2

0 citations

α

Published on arXiv

2603.00194

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SKeDA outperforms existing baselines in both fidelity and traceability under video-specific distortions including inter-frame compression, frame deletion, and noise

SKeDA

Novel technique introduced


The rise of text-to-video generation models has raised growing concerns over content authenticity, copyright protection, and malicious misuse. Watermarking serves as an effective mechanism for regulating such AI-generated content, where high fidelity and strong robustness are particularly critical. Recent generative image watermarking methods provide a promising foundation by leveraging watermark information and pseudo-random keys to control the initial sampling noise, enabling lossless embedding. However, directly extending these techniques to videos introduces two key limitations: Existing designs implicitly rely on strict alignment between video frames and frame-dependent pseudo-random binary sequences used for watermark encryption. Once this alignment is disrupted, subsequent watermark extraction becomes unreliable; and Video-specific distortions, such as inter-frame compression, significantly degrade watermark reliability. To address these issues, we propose SKeDA, a generative watermarking framework tailored for text-to-video diffusion models. SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation. This design transforms watermark extraction from synchronization-sensitive sequence decoding into permutation-tolerant set-level aggregation, substantially improving robustness against frame reordering and loss; and (2) Differential Attention (DA), which computes inter-frame differences and dynamically adjusts attention weights during extraction, enhancing robustness against temporal distortions. Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.


Key Contributions

  • Shuffle-Key-based Distribution-preserving Sampling (SKe) that uses a single base pseudo-random binary sequence permuted per frame, converting extraction from synchronization-sensitive decoding to permutation-tolerant set-level aggregation
  • Differential Attention (DA) module that computes inter-frame differences and dynamically reweights extraction attention to counteract temporal distortions like inter-frame compression
  • First generative watermarking framework specifically designed for text-to-video diffusion models, robust against frame reordering, frame deletion, compression, and noise

🛡️ Threat Analysis

Output Integrity Attack

SKeDA watermarks AI-generated VIDEO OUTPUTS (not model weights) to trace content provenance, verify authenticity, and support copyright attribution — classic output integrity / content watermarking. The watermark is embedded in the generated content itself, not in model parameters.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timedigital
Applications
text-to-video generationai-generated content provenancecopyright protectionsynthetic media attribution