RAVEN: Erasing Invisible Watermarks via Novel View Synthesis
Fahad Shamshad 1, Nils Lukas 1, Karthik Nandakumar 1,2
Published on arXiv
2601.08832
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Outperforms 14 baseline watermark removal attacks across 15 watermarking schemes (including SynthID-style semantic watermarks) while preserving superior perceptual quality, operating zero-shot without watermark or detector knowledge.
RAVEN
Novel technique introduced
Invisible watermarking has become a critical mechanism for authenticating AI-generated image content, with major platforms deploying watermarking schemes at scale. However, evaluating the vulnerability of these schemes against sophisticated removal attacks remains essential to assess their reliability and guide robust design. In this work, we expose a fundamental vulnerability in invisible watermarks by reformulating watermark removal as a view synthesis problem. Our key insight is that generating a perceptually consistent alternative view of the same semantic content, akin to re-observing a scene from a shifted perspective, naturally removes the embedded watermark while preserving visual fidelity. This reveals a critical gap: watermarks robust to pixel-space and frequency-domain attacks remain vulnerable to semantic-preserving viewpoint transformations. We introduce a zero-shot diffusion-based framework that applies controlled geometric transformations in latent space, augmented with view-guided correspondence attention to maintain structural consistency during reconstruction. Operating on frozen pre-trained models without detector access or watermark knowledge, our method achieves state-of-the-art watermark suppression across 15 watermarking methods--outperforming 14 baseline attacks while maintaining superior perceptual quality across multiple datasets.
Key Contributions
- Reformulates watermark removal as a novel view synthesis problem, revealing that semantic-preserving viewpoint transformations can defeat watermarks robust to pixel-space and frequency-domain attacks
- Proposes RAVEN, a zero-shot diffusion-based framework applying controlled geometric transformations in latent space with view-guided correspondence attention, requiring no detector access or watermark knowledge
- Achieves state-of-the-art watermark suppression across 15 watermarking methods while outperforming 14 baseline attacks and maintaining superior perceptual quality
🛡️ Threat Analysis
RAVEN is a watermark removal attack targeting invisible content watermarks embedded in AI-generated images to trace provenance and authenticity. Removing/defeating content watermarks is a direct ML09 Output Integrity Attack, as per the taxonomy — even though the underlying protections may use imperceptible signals, defeating them attacks content integrity, not adversarial example generation.