attack 2025

D2RA: Dual Domain Regeneration Attack

Pragati Shuddhodhan Meshram , Varun Chandrasekaran

0 citations · 19 references · arXiv

α

Published on arXiv

2510.07538

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

D2RA consistently reduces watermark detectability across diverse semantic watermarking schemes in a training-free, black-box, single-image setting without perceptible visual degradation.

D2RA (Dual Domain Regeneration Attack)

Novel technique introduced


The growing use of generative models has intensified the need for watermarking methods that ensure content attribution and provenance. While recent semantic watermarking schemes improve robustness by embedding signals in latent or frequency representations, we show they remain vulnerable even under resource-constrained adversarial settings. We present D2RA, a training-free, single-image attack that removes or weakens watermarks without access to the underlying model. By projecting watermarked images onto natural priors across complementary representations, D2RA suppresses watermark signals while preserving visual fidelity. Experiments across diverse watermarking schemes demonstrate that our approach consistently reduces watermark detectability, revealing fundamental weaknesses in current designs. Our code is available at https://github.com/Pragati-Meshram/DAWN.


Key Contributions

  • Training-free, single-image watermark removal attack (D2RA) requiring no access to the watermarking model or its internals
  • Dual-domain projection strategy that suppresses watermark signals across complementary latent/frequency representations while preserving visual fidelity
  • Systematic demonstration of fundamental weaknesses in current semantic watermarking schemes under resource-constrained adversarial settings

🛡️ Threat Analysis

Output Integrity Attack

D2RA directly attacks content watermarks embedded in AI-generated images to defeat provenance and attribution schemes — a canonical watermark removal attack on output integrity. The paper explicitly targets semantic watermarking methods (latent/frequency-domain) used for content provenance, and the goal is suppressing watermark detectability, not causing model misclassification.


Details

Domains
visiongenerative
Model Types
diffusiongan
Threat Tags
black_boxinference_time
Applications
image watermarkingai-generated content provenancecontent attribution