defense 2025

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Yunfei Yang 1,2,3, Xiaojun Chen 1,2,3, Yuexin Xuan 4, Zhendong Zhao 1,2, Xin Zhao 1,2,3, He Li 1,2,3

1 citations · arXiv

α

Published on arXiv

2511.08985

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Surpasses existing watermarking approaches in robustness against hard-label, multi-class, and data-free model stealing attacks, as well as watermark removal and adaptive attacks, achieving new state-of-the-art effectiveness.

DeepTracer

Novel technique introduced


Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.


Key Contributions

  • Analysis identifying the root cause of watermark failure under model stealing: independence between primary task distribution and watermark task distribution
  • DeepTracer framework using a novel watermark sample construction method (spanning primary feature space via class combination) and a same-class coupling loss to force high coupling between watermark and primary tasks
  • Two-stage watermark key sample filtering mechanism that selects the most reliable samples for ownership verification, improving robustness against watermark removal and detection attacks

🛡️ Threat Analysis

Model Theft

DeepTracer embeds watermarks IN THE MODEL (black-box behavioral watermark) to prove ownership when the model is stolen via query-based extraction. The watermark is specifically designed to survive model stealing attacks, directly defending against model IP theft. This is classic ML05: a watermark-in-model scheme for ownership verification, not content provenance.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_time
Datasets
CIFAR-10ImageNet
Applications
model ip protectionmlaas copyright verification