defense 2025

A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking

Chaoyue Huang , Hanzhou Wu

1 citations · 19 references · International Symposium on Dig...

α

Published on arXiv

2501.01194

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Derives optimal defender and attacker strategies for trigger-based black-box model watermarking via game-theoretic payoff function analysis, providing a theoretical basis for future watermarking scheme design.


Watermarking deep neural network (DNN) models has attracted a great deal of attention and interest in recent years because of the increasing demand to protect the intellectual property of DNN models. Many practical algorithms have been proposed by covertly embedding a secret watermark into a given DNN model through either parametric/structural modulation or backdooring against intellectual property infringement from the attacker while preserving the model performance on the original task. Despite the performance of these approaches, the lack of basic research restricts the algorithmic design to either a trial-based method or a data-driven technique. This has motivated the authors in this paper to introduce a game between the model attacker and the model defender for trigger-based black-box model watermarking. For each of the two players, we construct the payoff function and determine the optimal response, which enriches the theoretical foundation of model watermarking and may inspire us to develop novel schemes in the future.


Key Contributions

  • Introduces a formal two-player game between model defender and attacker for trigger-based black-box DNN watermarking
  • Constructs payoff functions for both players and derives their optimal strategies
  • Enriches the theoretical foundation of model watermarking beyond empirical/data-driven design

🛡️ Threat Analysis

Model Theft

The paper's primary contribution is a theoretical foundation for trigger-based black-box model watermarking, where watermarks are embedded in model weights/behavior via backdoor triggers to prove ownership if the model is stolen — this is model IP protection against model theft.


Details

Domains
vision
Model Types
cnn
Threat Tags
black_boxtraining_time
Applications
dnn model ip protectionimage classification