defense 2026

SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

Xinjie Zhu , Zijing Zhao , Hui Jin , Qingxiao Guo , Yilong Ma , Yunhao Wang , Xiaobing Guo , Weifeng Zhang

0 citations

α

Published on arXiv

2603.02882

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves very high bit-accuracy under both temporal and spatial disturbances with minimal overhead on modern video diffusion models with causal 3D VAEs, outperforming non-blind in-generation baselines in scalability and robustness.

SIGMark

Novel technique introduced


Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.


Key Contributions

  • GF-PRC (Global set of Frame-wise PseudoRandom Coding keys) enabling blind watermark extraction without storing large-scale message-key pairs, while preserving distortion-free noise distribution
  • SGO (Segment Group-Ordering) module tailored to causal 3D VAEs that ensures robust watermark inversion under temporal disturbances during extraction
  • End-to-end SIGMark framework achieving high bit-accuracy under both temporal and spatial distortions with minimal computational overhead on modern video diffusion models

🛡️ Threat Analysis

Output Integrity Attack

Watermarks are embedded in diffusion model VIDEO OUTPUTS (generated content) to trace provenance and authenticate AI-generated videos — this is content watermarking / output integrity, not model weight protection. The framework enables blind extraction without maintaining message-key pairs, directly addressing scalable content provenance at deployment.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_time
Applications
ai-generated video content provenanceharmful content tracingvideo diffusion model watermarking