DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks
Yunfei Yang 1,2,3, Xiaojun Chen 1,2,3, Yuexin Xuan 4, Zhendong Zhao 1,2, Xin Zhao 1,2,3, He Li 1,2,3
Published on arXiv
2511.08985
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Surpasses existing watermarking approaches in robustness against hard-label, multi-class, and data-free model stealing attacks, as well as watermark removal and adaptive attacks, achieving new state-of-the-art effectiveness.
DeepTracer
Novel technique introduced
Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.
Key Contributions
- Analysis identifying the root cause of watermark failure under model stealing: independence between primary task distribution and watermark task distribution
- DeepTracer framework using a novel watermark sample construction method (spanning primary feature space via class combination) and a same-class coupling loss to force high coupling between watermark and primary tasks
- Two-stage watermark key sample filtering mechanism that selects the most reliable samples for ownership verification, improving robustness against watermark removal and detection attacks
🛡️ Threat Analysis
DeepTracer embeds watermarks IN THE MODEL (black-box behavioral watermark) to prove ownership when the model is stolen via query-based extraction. The watermark is specifically designed to survive model stealing attacks, directly defending against model IP theft. This is classic ML05: a watermark-in-model scheme for ownership verification, not content provenance.