Guidance Watermarking for Diffusion Models

This paper introduces a novel watermarking method for diffusion models. It is based on guiding the diffusion process using the gradient computed from any off-the-shelf watermark decoder. The gradient computation encompasses different image augmentations, increasing robustness to attacks against which the decoder was not originally robust, without retraining or fine-tuning. Our method effectively convert any \textit{post-hoc} watermarking scheme into an in-generation embedding along the diffusion process. We show that this approach is complementary to watermarking techniques modifying the variational autoencoder at the end of the diffusion process. We validate the methods on different diffusion models and detectors. The watermarking guidance does not significantly alter the generated image for a given seed and prompt, preserving both the diversity and quality of generation.

Key Contributions

Guidance-based watermarking that steers the diffusion sampling process using gradients from an off-the-shelf watermark decoder, converting any post-hoc scheme into in-generation embedding
Incorporation of image augmentations during gradient computation to improve robustness against attacks beyond the decoder's original training distribution without retraining or fine-tuning
Demonstrated complementarity with VAE-level watermarking techniques and validation across multiple diffusion models and detectors with minimal impact on image quality and diversity

🛡️ Threat Analysis

Output Integrity Attack

Watermarks are embedded in diffusion model OUTPUTS (generated images) to track content provenance — this is content watermarking for output integrity, not model IP protection. The method converts post-hoc watermarking schemes into in-generation embedding, directly addressing AI-generated content authentication.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_timedigital

Applications

2026 0 cit.

Output Integrity Attack

100%