Learning to Watermark in the Latent Space of Generative Models
Sylvestre-Alvise Rebuffi 1, Tuan Tran 1, Valeriu Lacatusu 1, Pierre Fernandez 1, Tomáš Souček 1, Nikola Jovanović 1,2, Tom Sander 1, Hady Elsahar 1, Alexandre Mourachko 1
1 Meta
Published on arXiv
2601.16140
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Latent space watermarking achieves competitive robustness with up to 20x speedup compared to pixel-space post-hoc watermarking methods while enabling tamper-resistant in-model deployment.
DistSeal
Novel technique introduced
Existing approaches for watermarking AI-generated images often rely on post-hoc methods applied in pixel space, introducing computational overhead and potential visual artifacts. In this work, we explore latent space watermarking and introduce DistSeal, a unified approach for latent watermarking that works across both diffusion and autoregressive models. Our approach works by training post-hoc watermarking models in the latent space of generative models. We demonstrate that these latent watermarkers can be effectively distilled either into the generative model itself or into the latent decoder, enabling in-model watermarking. The resulting latent watermarks achieve competitive robustness while offering similar imperceptibility and up to 20x speedup compared to pixel-space baselines. Our experiments further reveal that distilling latent watermarkers outperforms distilling pixel-space ones, providing a solution that is both more efficient and more robust.
Key Contributions
- DistSeal: a unified latent-space watermarking framework compatible with both diffusion and autoregressive image generation models, achieving up to 20x speedup over pixel-space baselines
- Demonstrates that latent watermarkers can be distilled into generative model weights or the latent decoder for in-model watermarking that cannot be bypassed in open-source deployments
- Empirically shows that distilling latent watermarkers is more effective than distilling pixel-space watermarkers, achieving state-of-the-art robustness for in-model watermarking
🛡️ Threat Analysis
Watermarks are embedded in model OUTPUTS (generated images) to authenticate synthetic origin and trace content provenance — this is content watermarking for output integrity, not model IP protection. The paper explicitly targets AI-generated content authentication and deepfake attribution.