attack 2026

SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark

Rui Bao 1, Zheng Gao 2, Xiaoyu Li 1, Xiaoyan Feng 1, Yang Song 1, Jiaojiao Jiang 1

0 citations

α

Published on arXiv

2603.29742

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 95-100% watermark removal success rates across nine different watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms with no semantic quality loss

SHIFT

Novel technique introduced


Diffusion-based watermarking methods embed verifiable marks by manipulating the initial noise or the reverse diffusion trajectory. However, these methods share a critical assumption: verification can succeed only if the diffusion trajectory can be faithfully reconstructed. This reliance on trajectory recovery constitutes a fundamental and exploitable vulnerability. We propose $\underline{\mathbf{S}}$tochastic $\underline{\mathbf{Hi}}$dden-Trajectory De$\underline{\mathbf{f}}$lec$\underline{\mathbf{t}}$ion ($\mathbf{SHIFT}$), a training-free attack that exploits this common weakness across diverse watermarking paradigms. SHIFT leverages stochastic diffusion resampling to deflect the generative trajectory in latent space, making the reconstructed image statistically decoupled from the original watermark-embedded trajectory while preserving strong visual quality and semantic consistency. Extensive experiments on nine representative watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms show that SHIFT achieves 95%--100% attack success rates with nearly no loss in semantic quality, without requiring any watermark-specific knowledge or model retraining.


Key Contributions

  • Training-free watermark removal attack exploiting trajectory reconstruction dependency across diverse diffusion watermarking paradigms
  • Stochastic resampling technique that deflects generative trajectories while preserving semantic quality
  • Achieves 95-100% attack success rates against nine representative watermarking methods without watermark-specific knowledge

🛡️ Threat Analysis

Output Integrity Attack

This paper attacks content watermarking schemes embedded in diffusion model outputs. Watermark removal is a classic ML09 attack — it defeats output integrity/provenance verification mechanisms. The watermarks are embedded in generated images to verify authenticity, and SHIFT removes them while preserving visual quality.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timeblack_box
Applications
image generationcontent authenticationwatermark verification