Where is the Watermark? Interpretable Watermark Detection at the Block Level
Maria Bulychev , Neil G. Marchant , Benjamin I. P. Rubinstein
Published on arXiv
2512.14994
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Watermarks remain robust to cropping up to half the image while providing interpretable block-level detection maps that outperform prior post-hoc methods in transparency
Block-level DWT Watermarking
Novel technique introduced
Recent advances in generative AI have enabled the creation of highly realistic digital content, raising concerns around authenticity, ownership, and misuse. While watermarking has become an increasingly important mechanism to trace and protect digital media, most existing image watermarking schemes operate as black boxes, producing global detection scores without offering any insight into how or where the watermark is present. This lack of transparency impacts user trust and makes it difficult to interpret the impact of tampering. In this paper, we present a post-hoc image watermarking method that combines localised embedding with region-level interpretability. Our approach embeds watermark signals in the discrete wavelet transform domain using a statistical block-wise strategy. This allows us to generate detection maps that reveal which regions of an image are likely watermarked or altered. We show that our method achieves strong robustness against common image transformations while remaining sensitive to semantic manipulations. At the same time, the watermark remains highly imperceptible. Compared to prior post-hoc methods, our approach offers more interpretable detection while retaining competitive robustness. For example, our watermarks are robust to cropping up to half the image.
Key Contributions
- Post-hoc image watermarking method using discrete wavelet transform domain with statistical block-wise embedding strategy
- Region-level detection maps that reveal which blocks of an image are watermarked or semantically altered, providing interpretable tamper localization
- Demonstration of robustness to common image transformations (e.g., cropping up to 50% of the image) while remaining sensitive to semantic manipulations and maintaining imperceptibility
🛡️ Threat Analysis
Embeds watermarks in image content (DWT domain) to trace provenance and detect tampering — this is content watermarking for output integrity/authenticity, not model weight watermarking. The detection maps that localize altered regions are a direct output integrity mechanism.