defense 2026

SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity Enhancement

Huming Qiu 1, Mi Zhang 1, Junjie Sun 2, Peiyi Chen 1, Xiaohan Zhang 1, Min Yang 1

0 citations · 51 references · arXiv (Cornell University)

α

Published on arXiv

2602.03377

Model Theft

OWASP ML Top 10 — ML05

Key Finding

SEW successfully defends against six state-of-the-art watermark removal attacks while maintaining model usability and watermark verification performance across three watermarking benchmarks.

SEW (Specificity-Enhanced Watermarking)

Novel technique introduced


To ensure the responsible distribution and use of open-source deep neural networks (DNNs), DNN watermarking has become a crucial technique to trace and verify unauthorized model replication or misuse. In practice, black-box watermarks manifest as specific predictive behaviors for specially crafted samples. However, due to the generalization nature of DNNs, the keys to extracting the watermark message are not unique, which would provide attackers with more opportunities. Advanced attack techniques can reverse-engineer approximate replacements for the original watermark keys, enabling subsequent watermark removal. In this paper, we explore black-box DNN watermarking specificity, which refers to the accuracy of a watermark's response to a key. Using this concept, we introduce Specificity-Enhanced Watermarking (SEW), a new method that improves specificity by reducing the association between the watermark and approximate keys. Through extensive evaluation using three popular watermarking benchmarks, we validate that enhancing specificity significantly contributes to strengthening robustness against removal attacks. SEW effectively defends against six state-of-the-art removal attacks, while maintaining model usability and watermark verification performance.


Key Contributions

  • Introduces the concept of 'specificity' for black-box DNN watermarks — measuring how precisely the model responds only to the original watermark key versus approximate reverse-engineered substitutes
  • Proposes Specificity-Enhanced Watermarking (SEW), which reduces the model's association between watermark behavior and approximate keys, making watermark removal attacks significantly harder
  • Demonstrates through evaluation on three watermarking benchmarks that SEW defends against six state-of-the-art removal attacks while preserving model utility and watermark verification accuracy

🛡️ Threat Analysis

Model Theft

Watermarks are embedded in DNN model prediction behavior to prove ownership and resist unauthorized replication — this is model IP protection. SEW defends the model watermark against removal attacks (adversaries reverse-engineer approximate keys to strip the watermark), which is precisely the ML05 threat model of model theft and IP defense.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxtraining_time
Datasets
CIFAR-10ImageNet
Applications
model ip protectionopen-source model distributionunauthorized model replication detection