I2VWM: Robust Watermarking for Image to Video Generation

The rapid progress of image-guided video generation (I2V) has raised concerns about its potential misuse in misinformation and fraud, underscoring the urgent need for effective digital watermarking. While existing watermarking methods demonstrate robustness within a single modality, they fail to trace source images in I2V settings. To address this gap, we introduce the concept of Robust Diffusion Distance, which measures the temporal persistence of watermark signals in generated videos. Building on this, we propose I2VWM, a cross-modal watermarking framework designed to enhance watermark robustness across time. I2VWM leverages a video-simulation noise layer during training and employs an optical-flow-based alignment module during inference. Experiments on both open-source and commercial I2V models demonstrate that I2VWM significantly improves robustness while maintaining imperceptibility, establishing a new paradigm for cross-modal watermarking in the era of generative video. \href{https://github.com/MrCrims/I2VWM-Robust-Watermarking-for-Image-to-Video-Generation}{Code Released.}

Key Contributions

Introduces Robust Diffusion Distance, a metric measuring temporal persistence of watermark signals across frames in I2V-generated videos
Proposes I2VWM, a cross-modal watermarking framework using a video-simulation noise layer during training and optical-flow-based alignment at inference to maintain watermark robustness
Demonstrates effectiveness on both open-source (Stable Video Diffusion, HunyuanVideo) and commercial I2V models

🛡️ Threat Analysis

Output Integrity Attack

Proposes content watermarking of AI-generated video outputs for provenance tracking — watermarks are embedded in source images to persist into generated videos, enabling traceability and authenticity verification of AI-generated content. This is output integrity and content provenance, not model IP protection.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_time

Datasets

MS-COCODIV2K (NTIRE 2017)

Applications

2026 0 cit.

Output Integrity Attack

100%

I2VWM: Robust Watermarking for Image to Video Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

FIND: A Simple yet Effective Baseline for Diffusion-Generated Image Detection

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

ALIEN: Analytic Latent Watermarking for Controllable Generation

EIRES:Training-free AI-Generated Image Detection via Edit-Induced Reconstruction Error Shift

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

A Difference-in-Difference Approach to Detecting AI-Generated Images