SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity Enhancement

To ensure the responsible distribution and use of open-source deep neural networks (DNNs), DNN watermarking has become a crucial technique to trace and verify unauthorized model replication or misuse. In practice, black-box watermarks manifest as specific predictive behaviors for specially crafted samples. However, due to the generalization nature of DNNs, the keys to extracting the watermark message are not unique, which would provide attackers with more opportunities. Advanced attack techniques can reverse-engineer approximate replacements for the original watermark keys, enabling subsequent watermark removal. In this paper, we explore black-box DNN watermarking specificity, which refers to the accuracy of a watermark's response to a key. Using this concept, we introduce Specificity-Enhanced Watermarking (SEW), a new method that improves specificity by reducing the association between the watermark and approximate keys. Through extensive evaluation using three popular watermarking benchmarks, we validate that enhancing specificity significantly contributes to strengthening robustness against removal attacks. SEW effectively defends against six state-of-the-art removal attacks, while maintaining model usability and watermark verification performance.

Key Contributions

Introduces the concept of 'specificity' for black-box DNN watermarks — measuring how precisely the model responds only to the original watermark key versus approximate reverse-engineered substitutes
Proposes Specificity-Enhanced Watermarking (SEW), which reduces the model's association between watermark behavior and approximate keys, making watermark removal attacks significantly harder
Demonstrates through evaluation on three watermarking benchmarks that SEW defends against six state-of-the-art removal attacks while preserving model utility and watermark verification accuracy

🛡️ Threat Analysis

Model Theft

Watermarks are embedded in DNN model prediction behavior to prove ownership and resist unauthorized replication — this is model IP protection. SEW defends the model watermark against removal attacks (adversaries reverse-engineer approximate keys to strip the watermark), which is precisely the ML05 threat model of model theft and IP defense.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxtraining_time

Datasets

CIFAR-10ImageNet

Applications

2025 0 cit.

Model Theft

82%

SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity Enhancement

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

An Information Asymmetry Game for Trigger-based DNN Model Watermarking

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking

RandMark: On Random Watermarking of Visual Foundation Models

BlackCATT: Black-box Collusion Aware Traitor Tracing in Federated Learning

ActiveMark: on watermarking of visual foundation models via massive activations

Defense against Unauthorized Distillation in Image Restoration via Feature Space Perturbation