defense 2025

Watermarking Graph Neural Networks via Explanations for Ownership Protection

Jane Downer , Ren Wang , Binghui Wang

0 citations

α

Published on arXiv

2501.05614

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Method is provably NP-hard to locate and empirically robust to fine-tuning and pruning removal attacks while avoiding training data pollution from backdoor-based approaches


Graph Neural Networks (GNNs) are the mainstream method to learn pervasive graph data and are widely deployed in industry, making their intellectual property valuable. However, protecting GNNs from unauthorized use remains a challenge. Watermarking, which embeds ownership information into a model, is a potential solution. However, existing watermarking methods have two key limitations: First, almost all of them focus on non-graph data, with watermarking GNNs for complex graph data largely unexplored. Second, the de facto backdoor-based watermarking methods pollute training data and induce ownership ambiguity through intentional misclassification. Our explanation-based watermarking inherits the strengths of backdoor-based methods (e.g., robust to watermark removal attacks), but avoids data pollution and eliminates intentional misclassification. In particular, our method learns to embed the watermark in GNN explanations such that this unique watermark is statistically distinct from other potential solutions, and ownership claims must show statistical significance to be verified. We theoretically prove that, even with full knowledge of our method, locating the watermark is an NP-hard problem. Empirically, our method manifests robustness to removal attacks like fine-tuning and pruning. By addressing these challenges, our approach marks a significant advancement in protecting GNN intellectual property.


Key Contributions

  • Explanation-based GNN watermarking that embeds ownership information in model explanation behavior rather than via backdoor data poisoning
  • Theoretical proof that locating the watermark is NP-hard even with full knowledge of the method
  • Statistical ownership verification framework ensuring watermark is distinct from other solutions with measurable significance

🛡️ Threat Analysis

Model Theft

Watermark is embedded INTO the GNN model (specifically in its explanation outputs) to prove ownership if the model is stolen or used without authorization — classic model IP protection. Defends against unauthorized model use and is evaluated against removal attacks like fine-tuning and pruning.


Details

Domains
graph
Model Types
gnn
Threat Tags
white_boxtraining_time
Applications
graph classificationgnn intellectual property protection