On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing

Robust invisible watermarking embeds hidden information in images such that the watermark can survive various manipulations. However, the emergence of powerful diffusion-based image generation and editing techniques poses a new threat to these watermarking schemes. In this paper, we investigate the intersection of diffusion-based image editing and robust image watermarking. We analyze how diffusion-driven image edits can significantly degrade or even fully remove embedded watermarks from state-of-the-art robust watermarking systems. Both theoretical formulations and empirical experiments are provided. We prove that as a image undergoes iterative diffusion transformations, the mutual information between the watermarked image and the embedded payload approaches zero, causing watermark decoding to fail. We further propose a guided diffusion attack algorithm that explicitly targets and erases watermark signals during generation. We evaluate our approach on recent deep learning-based watermarking schemes and demonstrate near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Finally, we discuss ethical implications of such watermark removal capablities and provide design guidelines for future watermarking strategies to be more resilient in the era of generative AI.

Key Contributions

Information-theoretic proof that iterative diffusion transformations drive mutual information between watermarked image and embedded payload to zero
Guided diffusion attack algorithm that explicitly targets and erases watermark signals during the diffusion generation process
Empirical evaluation on StegaStamp, TrustMark, and VINE demonstrating near-zero watermark recovery rates with high visual fidelity preserved

🛡️ Threat Analysis

Output Integrity Attack

Paper proposes and analyzes attacks that remove/defeat content watermarks (StegaStamp, TrustMark, VINE) embedded in images for provenance tracking — watermark removal attacks on output integrity schemes are explicitly ML09. The guided diffusion attack erases embedded payload signals, and the paper provides an information-theoretic proof that mutual information between watermarked image and payload approaches zero under iterative diffusion.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

grey_boxinference_timedigital

Applications

2026 0 cit.

Output Integrity Attack

79%