defense 2025

StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models

Haoxin Yang 1, Bangzhen Liu 1, Xuemiao Xu 1, Cheng Xu 2, Yuyang Yu 1, Zikai Huang 1, Yi Wang 3, Shengfeng He 2

1 citations · 71 references · arXiv

α

Published on arXiv

2509.17993

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

StableGuard consistently outperforms state-of-the-art methods across image fidelity, watermark verification accuracy, and tampered region localization on LDM-generated images.

StableGuard

Novel technique introduced


The advancement of diffusion models has enhanced the realism of AI-generated content but also raised concerns about misuse, necessitating robust copyright protection and tampering localization. Although recent methods have made progress toward unified solutions, their reliance on post hoc processing introduces considerable application inconvenience and compromises forensic reliability. We propose StableGuard, a novel framework that seamlessly integrates a binary watermark into the diffusion generation process, ensuring copyright protection and tampering localization in Latent Diffusion Models through an end-to-end design. We develop a Multiplexing Watermark VAE (MPW-VAE) by equipping a pretrained Variational Autoencoder (VAE) with a lightweight latent residual-based adapter, enabling the generation of paired watermarked and watermark-free images. These pairs, fused via random masks, create a diverse dataset for training a tampering-agnostic forensic network. To further enhance forensic synergy, we introduce a Mixture-of-Experts Guided Forensic Network (MoE-GFN) that dynamically integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for precise watermark verification and tampered region detection. The MPW-VAE and MoE-GFN are jointly optimized in a self-supervised, end-to-end manner, fostering a reciprocal training between watermark embedding and forensic accuracy. Extensive experiments demonstrate that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.


Key Contributions

  • Multiplexing Watermark VAE (MPW-VAE) using a lightweight latent residual-based adapter that generates paired watermarked/watermark-free images from a pretrained LDM VAE without modifying the diffusion backbone
  • Mixture-of-Experts Guided Forensic Network (MoE-GFN) that integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for simultaneous watermark verification and pixel-level tamper localization
  • End-to-end self-supervised co-training between MPW-VAE and MoE-GFN using mask-fused paired images, eliminating post hoc processing and improving forensic reliability

🛡️ Threat Analysis

Output Integrity Attack

StableGuard watermarks AI-generated IMAGE OUTPUTS (not model weights) from Latent Diffusion Models for copyright protection and content provenance. The MoE-GFN forensic network then verifies these content watermarks and localizes tampered regions — classic output integrity / content authenticity work.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timedigital
Applications
ai-generated image copyright protectiondigital image forensicstamper localization