attack 2026

RAVEN: Erasing Invisible Watermarks via Novel View Synthesis

Fahad Shamshad 1, Nils Lukas 1, Karthik Nandakumar 1,2

0 citations · 45 references · arXiv

α

Published on arXiv

2601.08832

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Outperforms 14 baseline watermark removal attacks across 15 watermarking schemes (including SynthID-style semantic watermarks) while preserving superior perceptual quality, operating zero-shot without watermark or detector knowledge.

RAVEN

Novel technique introduced


Invisible watermarking has become a critical mechanism for authenticating AI-generated image content, with major platforms deploying watermarking schemes at scale. However, evaluating the vulnerability of these schemes against sophisticated removal attacks remains essential to assess their reliability and guide robust design. In this work, we expose a fundamental vulnerability in invisible watermarks by reformulating watermark removal as a view synthesis problem. Our key insight is that generating a perceptually consistent alternative view of the same semantic content, akin to re-observing a scene from a shifted perspective, naturally removes the embedded watermark while preserving visual fidelity. This reveals a critical gap: watermarks robust to pixel-space and frequency-domain attacks remain vulnerable to semantic-preserving viewpoint transformations. We introduce a zero-shot diffusion-based framework that applies controlled geometric transformations in latent space, augmented with view-guided correspondence attention to maintain structural consistency during reconstruction. Operating on frozen pre-trained models without detector access or watermark knowledge, our method achieves state-of-the-art watermark suppression across 15 watermarking methods--outperforming 14 baseline attacks while maintaining superior perceptual quality across multiple datasets.


Key Contributions

  • Reformulates watermark removal as a novel view synthesis problem, revealing that semantic-preserving viewpoint transformations can defeat watermarks robust to pixel-space and frequency-domain attacks
  • Proposes RAVEN, a zero-shot diffusion-based framework applying controlled geometric transformations in latent space with view-guided correspondence attention, requiring no detector access or watermark knowledge
  • Achieves state-of-the-art watermark suppression across 15 watermarking methods while outperforming 14 baseline attacks and maintaining superior perceptual quality

🛡️ Threat Analysis

Output Integrity Attack

RAVEN is a watermark removal attack targeting invisible content watermarks embedded in AI-generated images to trace provenance and authenticity. Removing/defeating content watermarks is a direct ML09 Output Integrity Attack, as per the taxonomy — even though the underlying protections may use imperceptible signals, defeating them attacks content integrity, not adversarial example generation.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
black_boxinference_timedigital
Applications
ai-generated image watermarkingcontent provenance authentication