Learning to Watermark in the Latent Space of Generative Models

Existing approaches for watermarking AI-generated images often rely on post-hoc methods applied in pixel space, introducing computational overhead and potential visual artifacts. In this work, we explore latent space watermarking and introduce DistSeal, a unified approach for latent watermarking that works across both diffusion and autoregressive models. Our approach works by training post-hoc watermarking models in the latent space of generative models. We demonstrate that these latent watermarkers can be effectively distilled either into the generative model itself or into the latent decoder, enabling in-model watermarking. The resulting latent watermarks achieve competitive robustness while offering similar imperceptibility and up to 20x speedup compared to pixel-space baselines. Our experiments further reveal that distilling latent watermarkers outperforms distilling pixel-space ones, providing a solution that is both more efficient and more robust.

Key Contributions

DistSeal: a unified latent-space watermarking framework compatible with both diffusion and autoregressive image generation models, achieving up to 20x speedup over pixel-space baselines
Demonstrates that latent watermarkers can be distilled into generative model weights or the latent decoder for in-model watermarking that cannot be bypassed in open-source deployments
Empirically shows that distilling latent watermarkers is more effective than distilling pixel-space watermarkers, achieving state-of-the-art robustness for in-model watermarking

🛡️ Threat Analysis

Output Integrity Attack

Watermarks are embedded in model OUTPUTS (generated images) to authenticate synthetic origin and trace content provenance — this is content watermarking for output integrity, not model IP protection. The paper explicitly targets AI-generated content authentication and deepfake attribution.

Details

Domains

visiongenerative

Model Types

diffusiontransformer

Threat Tags

inference_timedigital

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization

Detecting AI-Generated Forgeries via Iterative Manifold Deviation Amplification

End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection

TrueMoE: Dual-Routing Mixture of Discriminative Experts for Synthetic Image Detection

Rethinking Cross-Generator Image Forgery Detection through DINOv3

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

Creating Blank Canvas Against AI-enabled Image Forgery

Towards Transferable Defense Against Malicious Image Edits