benchmark 2025

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Xincheng Wang 1, Hanchi Sun 2, Wenjun Sun 3, Kejun Xue 1, Wangqiu Zhou 4, Jianbo Zhang 2, Wei Sun 5, Dandan Zhu 5, Xiongkuo Min 2, Jun Jia 2, Zhijun Fang 1

0 citations · 45 references · arXiv

α

Published on arXiv

2511.19316

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Existing dataset watermarking methods fail under real-world threat scenarios, and the proposed removal approach fully eliminates watermarks without affecting downstream fine-tuning quality


Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lack a unified evaluation framework. To address this, this paper establishes a general threat model and introduces a comprehensive evaluation framework encompassing Universality, Transmissibility, and Robustness. Experiments show that existing methods perform well in universality and transmissibility, and exhibit some robustness against common image processing operations, yet still fall short under real-world threat scenarios. To reveal these vulnerabilities, the paper further proposes a practical watermark removal method that fully eliminates dataset watermarks without affecting fine-tuning, highlighting a key challenge for future research.


Key Contributions

  • Establishes a unified evaluation framework for dataset watermarking covering Universality, Transmissibility, and Robustness across fine-tuned diffusion models
  • Empirically demonstrates that existing watermarking methods are vulnerable under realistic threat scenarios despite appearing robust to standard image processing
  • Proposes a practical watermark removal method that fully eliminates embedded dataset watermarks without degrading fine-tuning performance

🛡️ Threat Analysis

Output Integrity Attack

The paper's dual contribution maps squarely to ML09: (1) it evaluates training-data watermarking methods that embed provenance signals in images so fine-tuned diffusion model outputs remain traceable — this is content provenance/output integrity; (2) it proposes a practical watermark removal attack that eliminates these signals without degrading fine-tuning quality — watermark removal attacks on content protection schemes are explicitly ML09 per the taxonomy. Per the watermarking decision tree, the watermarks reside in training data/outputs (not model weights), so this is ML09, not ML05.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timedigitalblack_box
Applications
diffusion model fine-tuningimage copyright protectionai-generated content traceabilitypersonalized image generation