Towards Robust Content Watermarking Against Removal and Forgery Attacks
Yifan Zhu 1,2, Yihan Wang 3,1,2, Xiao-Shan Gao 1,2
Published on arXiv
2604.06662
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Demonstrates superior robustness against removal and forgery attacks on diffusion model watermarks through dynamic instance-specific injection
ISTS
Novel technique introduced
Generated contents have raised serious concerns about copyright protection, image provenance, and credit attribution. A potential solution for these problems is watermarking. Recently, content watermarking for text-to-image diffusion models has been studied extensively for its effective detection utility and robustness. However, these watermarking techniques are vulnerable to potential adversarial attacks, such as removal attacks and forgery attacks. In this paper, we build a novel watermarking paradigm called Instance-Specific watermarking with Two-Sided detection (ISTS) to resist removal and forgery attacks. Specifically, we introduce a strategy that dynamically controls the injection time and watermarking patterns based on the semantics of users' prompts. Furthermore, we propose a new two-sided detection approach to enhance robustness in watermark detection. Experiments have demonstrated the superiority of our watermarking against removal and forgery attacks.
Key Contributions
- Instance-Specific watermarking with Two-Sided detection (ISTS) paradigm that dynamically controls injection time and patterns based on prompt semantics
- Two-sided detection approach enhancing robustness against removal and forgery attacks
- Demonstrated superior robustness against adversarial watermark manipulation attacks
🛡️ Threat Analysis
Proposes content watermarking for diffusion-generated images to protect provenance and resist watermark removal/forgery attacks — this is output integrity and content authentication.