StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models
Haoxin Yang 1, Bangzhen Liu 1, Xuemiao Xu 1, Cheng Xu 2, Yuyang Yu 1, Zikai Huang 1, Yi Wang 3, Shengfeng He 2
Published on arXiv
2509.17993
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
StableGuard consistently outperforms state-of-the-art methods across image fidelity, watermark verification accuracy, and tampered region localization on LDM-generated images.
StableGuard
Novel technique introduced
The advancement of diffusion models has enhanced the realism of AI-generated content but also raised concerns about misuse, necessitating robust copyright protection and tampering localization. Although recent methods have made progress toward unified solutions, their reliance on post hoc processing introduces considerable application inconvenience and compromises forensic reliability. We propose StableGuard, a novel framework that seamlessly integrates a binary watermark into the diffusion generation process, ensuring copyright protection and tampering localization in Latent Diffusion Models through an end-to-end design. We develop a Multiplexing Watermark VAE (MPW-VAE) by equipping a pretrained Variational Autoencoder (VAE) with a lightweight latent residual-based adapter, enabling the generation of paired watermarked and watermark-free images. These pairs, fused via random masks, create a diverse dataset for training a tampering-agnostic forensic network. To further enhance forensic synergy, we introduce a Mixture-of-Experts Guided Forensic Network (MoE-GFN) that dynamically integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for precise watermark verification and tampered region detection. The MPW-VAE and MoE-GFN are jointly optimized in a self-supervised, end-to-end manner, fostering a reciprocal training between watermark embedding and forensic accuracy. Extensive experiments demonstrate that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.
Key Contributions
- Multiplexing Watermark VAE (MPW-VAE) using a lightweight latent residual-based adapter that generates paired watermarked/watermark-free images from a pretrained LDM VAE without modifying the diffusion backbone
- Mixture-of-Experts Guided Forensic Network (MoE-GFN) that integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for simultaneous watermark verification and pixel-level tamper localization
- End-to-end self-supervised co-training between MPW-VAE and MoE-GFN using mask-fused paired images, eliminating post hoc processing and improving forensic reliability
🛡️ Threat Analysis
StableGuard watermarks AI-generated IMAGE OUTPUTS (not model weights) from Latent Diffusion Models for copyright protection and content provenance. The MoE-GFN forensic network then verifies these content watermarks and localizes tampered regions — classic output integrity / content authenticity work.