An Information Asymmetry Game for Trigger-based DNN Model Watermarking

As a valuable digital product, deep neural networks (DNNs) face increasingly severe threats to the intellectual property, making it necessary to develop effective technical measures to protect them. Trigger-based watermarking methods achieve copyright protection by embedding triggers into the host DNNs. However, the attacker may remove the watermark by pruning or fine-tuning. We model this interaction as a game under conditions of information asymmetry, namely, the defender embeds a secret watermark with private knowledge, while the attacker can only access the watermarked model and seek removal. We define strategies, costs, and utilities for both players, derive the attacker's optimal pruning budget, and establish an exponential lower bound on the accuracy of watermark detection after attack. Experimental results demonstrate the feasibility of the watermarked model, and indicate that sparse watermarking can resist removal with negligible accuracy loss. This study highlights the effectiveness of game-theoretic analysis in guiding the design of robust watermarking schemes for model copyright protection.

Key Contributions

Game-theoretic framework modeling the strategic interaction between watermark embedding and pruning-based removal under information asymmetry (defender holds private trigger knowledge; attacker only observes the released model)
Closed-form derivation of the attacker's optimal pruning budget and an exponential lower bound on watermark success rate (WSR) after pruning attacks
Experimental validation showing sparse watermarking resists pruning-based removal with negligible clean accuracy loss

🛡️ Threat Analysis

Model Theft

The paper's primary contribution is protecting DNN intellectual property through trigger-based model watermarking — the watermark is embedded in model weights/behavior to enable copyright verification if the model is stolen or redistributed. The game-theoretic analysis is specifically aimed at making this ownership watermark robust against pruning-based removal attacks.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxtraining_time

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity Enhancement

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking

RandMark: On Random Watermarking of Visual Foundation Models

BlackCATT: Black-box Collusion Aware Traitor Tracing in Federated Learning

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

ActiveMark: on watermarking of visual foundation models via massive activations