Guidance Watermarking for Diffusion Models
Enoal Gesny 1,2,3,4, Eva Giboulot 1,2,3,4, Teddy Furon 1,2,3,4, Vivien Chappelier 5
Published on arXiv
2509.22126
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Converts arbitrary post-hoc image watermarking schemes into robust in-generation embedding for diffusion models while preserving generation quality and diversity across seeds and prompts.
Guidance Watermarking
Novel technique introduced
This paper introduces a novel watermarking method for diffusion models. It is based on guiding the diffusion process using the gradient computed from any off-the-shelf watermark decoder. The gradient computation encompasses different image augmentations, increasing robustness to attacks against which the decoder was not originally robust, without retraining or fine-tuning. Our method effectively convert any \textit{post-hoc} watermarking scheme into an in-generation embedding along the diffusion process. We show that this approach is complementary to watermarking techniques modifying the variational autoencoder at the end of the diffusion process. We validate the methods on different diffusion models and detectors. The watermarking guidance does not significantly alter the generated image for a given seed and prompt, preserving both the diversity and quality of generation.
Key Contributions
- Guidance-based watermarking that steers the diffusion sampling process using gradients from an off-the-shelf watermark decoder, converting any post-hoc scheme into in-generation embedding
- Incorporation of image augmentations during gradient computation to improve robustness against attacks beyond the decoder's original training distribution without retraining or fine-tuning
- Demonstrated complementarity with VAE-level watermarking techniques and validation across multiple diffusion models and detectors with minimal impact on image quality and diversity
🛡️ Threat Analysis
Watermarks are embedded in diffusion model OUTPUTS (generated images) to track content provenance — this is content watermarking for output integrity, not model IP protection. The method converts post-hoc watermarking schemes into in-generation embedding, directly addressing AI-generated content authentication.