Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!

In this paper, we highlight a critical threat posed by emerging neural models: data plagiarism. We demonstrate how modern neural models (e.g., diffusion models) can replicate copyrighted images, even when protected by advanced watermarking techniques. To expose vulnerabilities in copyright protection and facilitate future research, we propose a general approach to neural plagiarism that can either forge replicas of copyrighted data or introduce copyright ambiguity. Our method, based on "anchors and shims", employs inverse latents as anchors and finds shim perturbations that gradually deviate the anchor latents, thereby evading watermark or copyright detection. By applying perturbations to the cross-attention mechanism at different timesteps, our approach induces varying degrees of semantic modification in copyrighted images, enabling it to bypass protections ranging from visible trademarks and signatures to invisible watermarks. Notably, our method is a purely gradient-based search that requires no additional training or fine-tuning. Experiments on MS-COCO and real-world copyrighted images show that diffusion models can replicate copyrighted images, underscoring the urgent need for countermeasures against neural plagiarism.

Key Contributions

Anchors-and-shims pipeline that uses inverse latents as anchors and finds gradient-based shim perturbations to replicate copyrighted images while evading watermark detection
Novel cross-attention perturbation technique applied at different denoising timesteps to induce coarse-to-fine semantic modification with low memory overhead
Comprehensive empirical analysis covering forgery and copyright-ambiguity attacks against visible trademarks, signatures, and invisible watermarks on MS-COCO and real-world images

🛡️ Threat Analysis

Output Integrity Attack

The paper directly attacks content protection schemes — removing/defeating visible watermarks (trademarks, signatures) and invisible watermarks embedded in images — by using gradient-based perturbations of cross-attention mechanisms in diffusion models. This is a watermark removal/evasion attack on output integrity and content provenance, not an adversarial misclassification attack.