attack 2026

Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework

Chunpeng Wang 1, Binyan Qu 1, Xiaoyu Wang 2, Zhiqiu Xia 1, Shanshan Zhang 3, Yunan Liu 4, Qi Li 1

0 citations

α

Published on arXiv

2604.22220

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves superior visual fidelity compared to existing watermark attacks while effectively neutralizing invisible watermark signals across diverse watermarking schemes

FMDiffWA

Novel technique introduced


Digital image watermarking has advanced rapidly for copyright protection of generative AI, yet the comparatively limited progress in watermark attack techniques has broken the attack-defense balance and hindered further advances in the field. In this paper, we propose FMDiffWA, a frequency-domain modulated diffusion framework for watermark attacks. Specifically, we introduce a frequency-domain watermark modulation (FWM) module and incorporate it into the sampling stages both the forward and reverse diffusion processes. This mechanism enables selective modulation of watermark-related frequency components, thereby allowing FMDiffWA to effectively neutralize the invisible watermark signals while preserving the perceptual quality of the attacked watermarked images. To achieve a better trade-off between attack efficacy and visual fidelity, we reformulate the training strategy of conventional diffusion models by augmenting the canonical noise estimation objective with an auxiliary refinement constraint. Comprehensive experiments demonstrate that FMDiffWA achieves superior visual fidelity compared to existing watermark attacks, while exhibiting strong generalization across diverse watermarking schemes.


Key Contributions

  • Frequency-domain watermark modulation (FWM) module integrated into diffusion forward/reverse processes for selective watermark suppression
  • Reformulated diffusion training strategy combining noise estimation with auxiliary refinement constraint for attack efficacy-fidelity trade-off
  • Demonstrates strong generalization across diverse watermarking schemes while maintaining superior visual quality

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks content watermarking schemes designed to protect AI-generated images. It removes watermarks embedded in images for copyright protection and provenance tracking. This is a direct attack on output integrity/content authentication mechanisms, making it ML09. NOT ML01 because the goal is to defeat a protection scheme (watermark removal), not to create adversarial examples that cause misclassification.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timedigital
Applications
copyright protectionai-generated content authenticationimage watermarking