attack 2025

The Coding Limits of Robust Watermarking for Generative Models

Danilo Francati ¹, Yevin Nikhel Goonatilake ², Shubham Pawar ³, Daniele Venturi ¹, Giuseppe Ateniese ²

¹ Sapienza University of Rome

² George Mason University

³ Royal Holloway, University of London

0 citations

Published on arXiv

2509.10577

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

A simple crop-and-resize operation flips approximately half of encoded latent signs in Gunn et al.'s (ICLR 2025) image watermarking scheme, reliably erasing the watermark — consistent with the proved theoretical limit that no binary cryptographic watermark survives adversarial modification of more than 50% of its bits

Zero-bit tamper-detection code

Novel technique introduced

We ask a basic question about cryptographic watermarking for generative models: to what extent can a watermark remain reliable when an adversary is allowed to corrupt the encoded signal? To study this question, we introduce a minimal coding abstraction that we call a zero-bit tamper-detection code. This is a secret-key procedure that samples a pseudorandom codeword and, given a candidate word, decides whether it should be treated as unmarked content or as the result of tampering with a valid codeword. It captures the two core requirements of robust watermarking: soundness and tamper detection. Within this abstraction we prove a sharp unconditional limit on robustness to independent symbol corruption. For an alphabet of size $q$, there is a critical corruption rate of $1 - 1/q$ such that no scheme with soundness, even relaxed to allow a fixed constant false positive probability on random content, can reliably detect tampering once an adversary can change more than this fraction of symbols. In particular, in the binary case no cryptographic watermark can remain robust if more than half of the encoded bits are modified. We also show that this threshold is tight by giving simple information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates. We then test experimentally whether this limit appears in practice by looking at the recent watermarking for images of Gunn, Zhao, and Song (ICLR 2025). We show that a simple crop and resize operation reliably flipped about half of the latent signs and consistently prevented belief-propagation decoding from recovering the codeword, erasing the watermark while leaving the image visually intact.

Key Contributions

Introduces the zero-bit tamper-detection code abstraction and proves a sharp critical corruption threshold of 1−1/q for alphabet size q, below which reliable watermark detection is impossible even with relaxed soundness
Shows this threshold is tight via information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates
Demonstrates experimentally that a simple crop-and-resize attack flips ~50% of latent signs in Gunn et al.'s image watermarking system, consistently preventing belief-propagation decoding while leaving the image visually intact

🛡️ Threat Analysis

Output Integrity Attack

Directly attacks content watermarking schemes for generative model outputs — establishes fundamental impossibility bounds and demonstrates a concrete watermark removal attack (crop and resize) on Gunn et al. ICLR 2025 image watermarking that erases the watermark while preserving visual quality.

Details

Domains

generativevision

Model Types

diffusion

Threat Tags

black_boxinference_timedigital

Applications

image watermarkinggenerative model content provenance

Read PDF arXiv

The Coding Limits of Robust Watermarking for Generative Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation

RAVEN: Erasing Invisible Watermarks via Novel View Synthesis

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Diffusion-Based Image Editing for Breaking Robust Watermarks

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance

Identifying Models Behind Text-to-Image Leaderboards

SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark