benchmark 2026

The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance

Jesse Yu 1, Nicholas Wei 2

0 citations

α

Published on arXiv

2603.10323

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Spatial watermarks achieve 67.47% AER evasion under Img2Img translation while latent watermarks yield 43.20% AER evasion under static cropping, demonstrating that no single-domain watermarking scheme is robust to modern adversarial toolsets.

Adversarial Evasion Region (AER)

Novel technique introduced


As open-weights generative AI rapidly proliferates, the ability to synthesize hyper-realistic media has introduced profound challenges to digital trust. Automated disinformation and AI-generated imagery have made robust digital provenance a critical cybersecurity imperative. Currently, state-of-the-art invisible watermarks operate within one of two primary mathematical manifolds: the spatial domain (post-generation pixel embedding) or the latent domain (pre-generation frequency embedding). While existing literature frequently evaluates these models against isolated, classical distortions, there is a critical lack of rigorous, comparative benchmarking against modern generative AI editing tools. In this study, we empirically evaluate two leading representative paradigms, RivaGAN (Spatial) and Tree-Ring (Latent), utilizing an automated Attack Simulation Engine across 30 intensity intervals of geometric and generative perturbations. We formalize an "Adversarial Evasion Region" (AER) framework to measure cryptographic degradation against semantic visual retention (OpenCLIP > 75.0). Our statistical analysis ($n=100$ per interval, $MOE = \pm 3.92\%$) reveals that these domains possess mutually exclusive, mathematically orthogonal vulnerabilities. Spatial watermarks experience severe cryptographic degradation under algorithmic pixel-rewriting (exhibiting a 67.47% AER evasion rate under Img2Img translation), whereas latent watermarks exhibit profound fragility against geometric misalignment (yielding a 43.20% AER evasion rate under static cropping). By proving that single-domain watermarking is fundamentally insufficient against modern adversarial toolsets, this research exposes a systemic vulnerability in current digital provenance standards and establishes the foundational exigence for future multi-domain cryptographic architectures.


Key Contributions

  • Proposes the Adversarial Evasion Region (AER) framework to quantify watermark cryptographic degradation while enforcing semantic visual retention (OpenCLIP > 75.0)
  • Empirically demonstrates that spatial and latent watermarking paradigms have mathematically orthogonal vulnerabilities: spatial (RivaGAN) collapses under Img2Img translation (67.47% AER), latent (Tree-Ring) collapses under geometric cropping (43.20% AER)
  • Establishes an automated Attack Simulation Engine across 30 intensity intervals, providing rigorous comparative evaluation against modern generative AI editing tools rather than classical distortions

🛡️ Threat Analysis

Output Integrity Attack

Evaluates attacks that defeat content provenance watermarks (RivaGAN spatial, Tree-Ring latent) embedded in AI-generated images, measuring cryptographic degradation while preserving visual semantics — directly addressing output integrity and content authentication.


Details

Domains
visiongenerative
Model Types
diffusiongan
Threat Tags
black_boxinference_timedigital
Datasets
Custom evaluation dataset (n=100 per intensity interval, 30 intervals)OpenCLIP semantic similarity scoring
Applications
ai-generated image watermarkingdigital provenancecontent authentication