The Coding Limits of Robust Watermarking for Generative Models
Danilo Francati 1, Yevin Nikhel Goonatilake 2, Shubham Pawar 3, Daniele Venturi 1, Giuseppe Ateniese 2
Published on arXiv
2509.10577
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
A simple crop-and-resize operation flips approximately half of encoded latent signs in Gunn et al.'s (ICLR 2025) image watermarking scheme, reliably erasing the watermark — consistent with the proved theoretical limit that no binary cryptographic watermark survives adversarial modification of more than 50% of its bits
Zero-bit tamper-detection code
Novel technique introduced
We ask a basic question about cryptographic watermarking for generative models: to what extent can a watermark remain reliable when an adversary is allowed to corrupt the encoded signal? To study this question, we introduce a minimal coding abstraction that we call a zero-bit tamper-detection code. This is a secret-key procedure that samples a pseudorandom codeword and, given a candidate word, decides whether it should be treated as unmarked content or as the result of tampering with a valid codeword. It captures the two core requirements of robust watermarking: soundness and tamper detection. Within this abstraction we prove a sharp unconditional limit on robustness to independent symbol corruption. For an alphabet of size $q$, there is a critical corruption rate of $1 - 1/q$ such that no scheme with soundness, even relaxed to allow a fixed constant false positive probability on random content, can reliably detect tampering once an adversary can change more than this fraction of symbols. In particular, in the binary case no cryptographic watermark can remain robust if more than half of the encoded bits are modified. We also show that this threshold is tight by giving simple information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates. We then test experimentally whether this limit appears in practice by looking at the recent watermarking for images of Gunn, Zhao, and Song (ICLR 2025). We show that a simple crop and resize operation reliably flipped about half of the latent signs and consistently prevented belief-propagation decoding from recovering the codeword, erasing the watermark while leaving the image visually intact.
Key Contributions
- Introduces the zero-bit tamper-detection code abstraction and proves a sharp critical corruption threshold of 1−1/q for alphabet size q, below which reliable watermark detection is impossible even with relaxed soundness
- Shows this threshold is tight via information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates
- Demonstrates experimentally that a simple crop-and-resize attack flips ~50% of latent signs in Gunn et al.'s image watermarking system, consistently preventing belief-propagation decoding while leaving the image visually intact
🛡️ Threat Analysis
Directly attacks content watermarking schemes for generative model outputs — establishes fundamental impossibility bounds and demonstrates a concrete watermark removal attack (crop and resize) on Gunn et al. ICLR 2025 image watermarking that erases the watermark while preserving visual quality.