Where is the Watermark? Interpretable Watermark Detection at the Block Level

Recent advances in generative AI have enabled the creation of highly realistic digital content, raising concerns around authenticity, ownership, and misuse. While watermarking has become an increasingly important mechanism to trace and protect digital media, most existing image watermarking schemes operate as black boxes, producing global detection scores without offering any insight into how or where the watermark is present. This lack of transparency impacts user trust and makes it difficult to interpret the impact of tampering. In this paper, we present a post-hoc image watermarking method that combines localised embedding with region-level interpretability. Our approach embeds watermark signals in the discrete wavelet transform domain using a statistical block-wise strategy. This allows us to generate detection maps that reveal which regions of an image are likely watermarked or altered. We show that our method achieves strong robustness against common image transformations while remaining sensitive to semantic manipulations. At the same time, the watermark remains highly imperceptible. Compared to prior post-hoc methods, our approach offers more interpretable detection while retaining competitive robustness. For example, our watermarks are robust to cropping up to half the image.

Key Contributions

Post-hoc image watermarking method using discrete wavelet transform domain with statistical block-wise embedding strategy
Region-level detection maps that reveal which blocks of an image are watermarked or semantically altered, providing interpretable tamper localization
Demonstration of robustness to common image transformations (e.g., cropping up to 50% of the image) while remaining sensitive to semantic manipulations and maintaining imperceptibility

🛡️ Threat Analysis

Output Integrity Attack

Embeds watermarks in image content (DWT domain) to trace provenance and detect tampering — this is content watermarking for output integrity/authenticity, not model weight watermarking. The detection maps that localize altered regions are a direct output integrity mechanism.

Details

Domains

vision

Threat Tags

inference_timedigital

Applications

2026 0 cit.

Output Integrity Attack

80%

Where is the Watermark? Interpretable Watermark Detection at the Block Level

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Novel Local Focusing Mechanism for Deepfake Detection Generalization

Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection

FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos

Wavelet-based GAN Fingerprint Detection using ResNet50

Attack-Aware Deepfake Detection under Counter-Forensic Manipulations

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection

RCDN: Real-Centered Detection Network for Robust Face Forgery Identification