Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks

Robust invisible watermarking aims to embed hidden messages into images such that they survive various manipulations while remaining imperceptible. However, powerful diffusion-based image generation and editing models now enable realistic content-preserving transformations that can inadvertently remove or distort embedded watermarks. In this paper, we present a theoretical and empirical analysis demonstrating that diffusion-based image editing can effectively break state-of-the-art robust watermarks designed to withstand conventional distortions. We analyze how the iterative noising and denoising process of diffusion models degrades embedded watermark signals, and provide formal proofs that under certain conditions a diffusion model's regenerated image retains virtually no detectable watermark information. Building on this insight, we propose a diffusion-driven attack that uses generative image regeneration to erase watermarks from a given image. Furthermore, we introduce an enhanced \emph{guided diffusion} attack that explicitly targets the watermark during generation by integrating the watermark decoder into the sampling loop. We evaluate our approaches on multiple recent deep learning watermarking schemes (e.g., StegaStamp, TrustMark, and VINE) and demonstrate that diffusion-based editing can reduce watermark decoding accuracy to near-zero levels while preserving high visual fidelity of the images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new watermarking strategies in the era of generative AI.

Key Contributions

Formal theoretical proof that ideal diffusion regeneration eliminates mutual information between watermarked image and embedded message, reducing decoding to random chance
Unguided diffusion regeneration attack that erases watermarks by passing images through a pretrained diffusion model's noise-denoise cycle
Guided diffusion attack that integrates the watermark decoder as an adversarial guide during generation to actively maximize watermark signal erasure while preserving visual fidelity

🛡️ Threat Analysis

Output Integrity Attack

The paper directly attacks content watermarks (StegaStamp, TrustMark, VINE) embedded in images for copyright protection and content authentication — watermark removal is the canonical ML09 threat. Both the unguided regeneration attack and the guided diffusion attack that integrates the watermark decoder into the sampling loop are attacks on output integrity schemes.

Details

Domains

visiongenerative

Model Types

diffusioncnn

Threat Tags

black_boxinference_timetargeted

Datasets

StegaStampTrustMarkVINE

Applications

2025 0 cit.

Output Integrity Attack

79%