defense 2026

Towards Robust Content Watermarking Against Removal and Forgery Attacks

Yifan Zhu 1,2, Yihan Wang 3,1,2, Xiao-Shan Gao 1,2

0 citations

α

Published on arXiv

2604.06662

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Demonstrates superior robustness against removal and forgery attacks on diffusion model watermarks through dynamic instance-specific injection

ISTS

Novel technique introduced


Generated contents have raised serious concerns about copyright protection, image provenance, and credit attribution. A potential solution for these problems is watermarking. Recently, content watermarking for text-to-image diffusion models has been studied extensively for its effective detection utility and robustness. However, these watermarking techniques are vulnerable to potential adversarial attacks, such as removal attacks and forgery attacks. In this paper, we build a novel watermarking paradigm called Instance-Specific watermarking with Two-Sided detection (ISTS) to resist removal and forgery attacks. Specifically, we introduce a strategy that dynamically controls the injection time and watermarking patterns based on the semantics of users' prompts. Furthermore, we propose a new two-sided detection approach to enhance robustness in watermark detection. Experiments have demonstrated the superiority of our watermarking against removal and forgery attacks.


Key Contributions

  • Instance-Specific watermarking with Two-Sided detection (ISTS) paradigm that dynamically controls injection time and patterns based on prompt semantics
  • Two-sided detection approach enhancing robustness against removal and forgery attacks
  • Demonstrated superior robustness against adversarial watermark manipulation attacks

🛡️ Threat Analysis

Output Integrity Attack

Proposes content watermarking for diffusion-generated images to protect provenance and resist watermark removal/forgery attacks — this is output integrity and content authentication.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_time
Applications
copyright protectionimage provenancecredit attribution