defense 2025

StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models

Haoxin Yang ¹, Bangzhen Liu ¹, Xuemiao Xu ¹, Cheng Xu ², Yuyang Yu ¹, Zikai Huang ¹, Yi Wang ³, Shengfeng He ²

¹ South China University of Technology

² Singapore Management University

³ Dongguan University of Technology

1 citations · 71 references · arXiv

Published on arXiv

2509.17993

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

StableGuard consistently outperforms state-of-the-art methods across image fidelity, watermark verification accuracy, and tampered region localization on LDM-generated images.

StableGuard

Novel technique introduced

The advancement of diffusion models has enhanced the realism of AI-generated content but also raised concerns about misuse, necessitating robust copyright protection and tampering localization. Although recent methods have made progress toward unified solutions, their reliance on post hoc processing introduces considerable application inconvenience and compromises forensic reliability. We propose StableGuard, a novel framework that seamlessly integrates a binary watermark into the diffusion generation process, ensuring copyright protection and tampering localization in Latent Diffusion Models through an end-to-end design. We develop a Multiplexing Watermark VAE (MPW-VAE) by equipping a pretrained Variational Autoencoder (VAE) with a lightweight latent residual-based adapter, enabling the generation of paired watermarked and watermark-free images. These pairs, fused via random masks, create a diverse dataset for training a tampering-agnostic forensic network. To further enhance forensic synergy, we introduce a Mixture-of-Experts Guided Forensic Network (MoE-GFN) that dynamically integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for precise watermark verification and tampered region detection. The MPW-VAE and MoE-GFN are jointly optimized in a self-supervised, end-to-end manner, fostering a reciprocal training between watermark embedding and forensic accuracy. Extensive experiments demonstrate that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.

Key Contributions

Multiplexing Watermark VAE (MPW-VAE) using a lightweight latent residual-based adapter that generates paired watermarked/watermark-free images from a pretrained LDM VAE without modifying the diffusion backbone
Mixture-of-Experts Guided Forensic Network (MoE-GFN) that integrates holistic watermark patterns, local tampering traces, and frequency-domain cues for simultaneous watermark verification and pixel-level tamper localization
End-to-end self-supervised co-training between MPW-VAE and MoE-GFN using mask-fused paired images, eliminating post hoc processing and improving forensic reliability

🛡️ Threat Analysis

Output Integrity Attack

StableGuard watermarks AI-generated IMAGE OUTPUTS (not model weights) from Latent Diffusion Models for copyright protection and content provenance. The MoE-GFN forensic network then verifies these content watermarks and localizes tampered regions — classic output integrity / content authenticity work.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_timedigital

Applications

ai-generated image copyright protectiondigital image forensicstamper localization

Read PDF arXiv DOI

StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity

VideoGuard: Protecting Video Content from Unauthorized Editing

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

PhaseMark: A Post-hoc, Optimization-Free Watermarking of AI-generated Images in the Latent Frequency Domain

Guidance Watermarking for Diffusion Models

Authenticated Contradictions from Desynchronized Provenance and Watermarking