defense 2025

An Information Asymmetry Game for Trigger-based DNN Model Watermarking

Chaoyue Huang 1, Gejian Zhao 1, Hanzhou Wu 1,2, Zhihua Xia 3, Asad Malik 4

0 citations · 20 references · arXiv

α

Published on arXiv

2510.14218

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Sparse trigger-based watermarking achieves an exponential lower bound on post-attack WSR, resisting optimal pruning with negligible accuracy loss as confirmed by experiments aligning with analytical predictions.


As a valuable digital product, deep neural networks (DNNs) face increasingly severe threats to the intellectual property, making it necessary to develop effective technical measures to protect them. Trigger-based watermarking methods achieve copyright protection by embedding triggers into the host DNNs. However, the attacker may remove the watermark by pruning or fine-tuning. We model this interaction as a game under conditions of information asymmetry, namely, the defender embeds a secret watermark with private knowledge, while the attacker can only access the watermarked model and seek removal. We define strategies, costs, and utilities for both players, derive the attacker's optimal pruning budget, and establish an exponential lower bound on the accuracy of watermark detection after attack. Experimental results demonstrate the feasibility of the watermarked model, and indicate that sparse watermarking can resist removal with negligible accuracy loss. This study highlights the effectiveness of game-theoretic analysis in guiding the design of robust watermarking schemes for model copyright protection.


Key Contributions

  • Game-theoretic framework modeling the strategic interaction between watermark embedding and pruning-based removal under information asymmetry (defender holds private trigger knowledge; attacker only observes the released model)
  • Closed-form derivation of the attacker's optimal pruning budget and an exponential lower bound on watermark success rate (WSR) after pruning attacks
  • Experimental validation showing sparse watermarking resists pruning-based removal with negligible clean accuracy loss

🛡️ Threat Analysis

Model Theft

The paper's primary contribution is protecting DNN intellectual property through trigger-based model watermarking — the watermark is embedded in model weights/behavior to enable copyright verification if the model is stolen or redistributed. The game-theoretic analysis is specifically aimed at making this ownership watermark robust against pruning-based removal attacks.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxtraining_time
Applications
model copyright protectiondnn intellectual property protection