DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.

Key Contributions

Analysis identifying the root cause of watermark failure under model stealing: independence between primary task distribution and watermark task distribution
DeepTracer framework using a novel watermark sample construction method (spanning primary feature space via class combination) and a same-class coupling loss to force high coupling between watermark and primary tasks
Two-stage watermark key sample filtering mechanism that selects the most reliable samples for ownership verification, improving robustness against watermark removal and detection attacks

🛡️ Threat Analysis

Model Theft

DeepTracer embeds watermarks IN THE MODEL (black-box behavioral watermark) to prove ownership when the model is stolen via query-based extraction. The watermark is specifically designed to survive model stealing attacks, directly defending against model IP theft. This is classic ML05: a watermark-in-model scheme for ownership verification, not content provenance.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_time

Datasets

CIFAR-10ImageNet

Applications

2025 0 cit.

Model Theft

82%