Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!
Zihang Zou 1, Boqing Gong 2, Liqiang Wang 1
Published on arXiv
2603.00150
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Diffusion models with the proposed gradient-based cross-attention perturbation can replicate copyrighted images and evade both visible and invisible watermark detectors without any model fine-tuning.
Anchors and Shims
Novel technique introduced
In this paper, we highlight a critical threat posed by emerging neural models: data plagiarism. We demonstrate how modern neural models (e.g., diffusion models) can replicate copyrighted images, even when protected by advanced watermarking techniques. To expose vulnerabilities in copyright protection and facilitate future research, we propose a general approach to neural plagiarism that can either forge replicas of copyrighted data or introduce copyright ambiguity. Our method, based on "anchors and shims", employs inverse latents as anchors and finds shim perturbations that gradually deviate the anchor latents, thereby evading watermark or copyright detection. By applying perturbations to the cross-attention mechanism at different timesteps, our approach induces varying degrees of semantic modification in copyrighted images, enabling it to bypass protections ranging from visible trademarks and signatures to invisible watermarks. Notably, our method is a purely gradient-based search that requires no additional training or fine-tuning. Experiments on MS-COCO and real-world copyrighted images show that diffusion models can replicate copyrighted images, underscoring the urgent need for countermeasures against neural plagiarism.
Key Contributions
- Anchors-and-shims pipeline that uses inverse latents as anchors and finds gradient-based shim perturbations to replicate copyrighted images while evading watermark detection
- Novel cross-attention perturbation technique applied at different denoising timesteps to induce coarse-to-fine semantic modification with low memory overhead
- Comprehensive empirical analysis covering forgery and copyright-ambiguity attacks against visible trademarks, signatures, and invisible watermarks on MS-COCO and real-world images
🛡️ Threat Analysis
The paper directly attacks content protection schemes — removing/defeating visible watermarks (trademarks, signatures) and invisible watermarks embedded in images — by using gradient-based perturbations of cross-attention mechanisms in diffusion models. This is a watermark removal/evasion attack on output integrity and content provenance, not an adversarial misclassification attack.