SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification
Yingjia Wang 1, Ting Qiao 1, Xing Liu 2, Chongzuo Li 3, Sixing Wu 1, Jianbin Li 1
Published on arXiv
2510.26420
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Sample-specific clean-label watermarks maintain visual imperceptibility and resist watermark removal attacks more effectively than static-pattern baselines, while avoiding the label-inconsistency vulnerability of poison-label approaches
SSCL-BW
Novel technique introduced
The rapid advancement of deep neural networks (DNNs) heavily relies on large-scale, high-quality datasets. However, unauthorized commercial use of these datasets severely violates the intellectual property rights of dataset owners. Existing backdoor-based dataset ownership verification methods suffer from inherent limitations: poison-label watermarks are easily detectable due to label inconsistencies, while clean-label watermarks face high technical complexity and failure on high-resolution images. Moreover, both approaches employ static watermark patterns that are vulnerable to detection and removal. To address these issues, this paper proposes a sample-specific clean-label backdoor watermarking (i.e., SSCL-BW). By training a U-Net-based watermarked sample generator, this method generates unique watermarks for each sample, fundamentally overcoming the vulnerability of static watermark patterns. The core innovation lies in designing a composite loss function with three components: target sample loss ensures watermark effectiveness, non-target sample loss guarantees trigger reliability, and perceptual similarity loss maintains visual imperceptibility. During ownership verification, black-box testing is employed to check whether suspicious models exhibit predefined backdoor behaviors. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed method and its robustness against potential watermark removal attacks.
Key Contributions
- U-Net-based watermarked sample generator that produces unique, sample-specific (non-static) backdoor patterns per image, defeating detection and removal attacks that exploit watermark homogeneity
- Composite loss function combining target sample loss (watermark effectiveness), non-target sample loss (trigger reliability), and perceptual similarity loss (visual imperceptibility) within a clean-label framework
- Black-box dataset ownership verification protocol requiring no white-box access to suspicious models, evaluated for robustness against removal attacks on both standard and high-resolution image benchmarks
🛡️ Threat Analysis
Watermarks TRAINING DATA to detect unauthorized dataset use — if a suspicious model exhibits predefined backdoor behaviors, the dataset's misappropriation is confirmed. This is training data watermarking for provenance detection, which maps directly to ML09 (output integrity / content provenance). The paper also evaluates robustness against watermark removal attacks, a core ML09 concern. The backdoor mechanism is the vehicle, not the threat: the primary contribution is dataset IP protection, not a backdoor attack.