Watermarking Graph Neural Networks via Explanations for Ownership Protection

Graph Neural Networks (GNNs) are the mainstream method to learn pervasive graph data and are widely deployed in industry, making their intellectual property valuable. However, protecting GNNs from unauthorized use remains a challenge. Watermarking, which embeds ownership information into a model, is a potential solution. However, existing watermarking methods have two key limitations: First, almost all of them focus on non-graph data, with watermarking GNNs for complex graph data largely unexplored. Second, the de facto backdoor-based watermarking methods pollute training data and induce ownership ambiguity through intentional misclassification. Our explanation-based watermarking inherits the strengths of backdoor-based methods (e.g., robust to watermark removal attacks), but avoids data pollution and eliminates intentional misclassification. In particular, our method learns to embed the watermark in GNN explanations such that this unique watermark is statistically distinct from other potential solutions, and ownership claims must show statistical significance to be verified. We theoretically prove that, even with full knowledge of our method, locating the watermark is an NP-hard problem. Empirically, our method manifests robustness to removal attacks like fine-tuning and pruning. By addressing these challenges, our approach marks a significant advancement in protecting GNN intellectual property.

Key Contributions

Explanation-based GNN watermarking that embeds ownership information in model explanation behavior rather than via backdoor data poisoning
Theoretical proof that locating the watermark is NP-hard even with full knowledge of the method
Statistical ownership verification framework ensuring watermark is distinct from other solutions with measurable significance

🛡️ Threat Analysis

Model Theft

Watermark is embedded INTO the GNN model (specifically in its explanation outputs) to prove ownership if the model is stolen or used without authorization — classic model IP protection. Defends against unauthorized model use and is evaluated against removal attacks like fine-tuning and pruning.

Details

Domains

graph

Model Types

gnn

Threat Tags

white_boxtraining_time

Applications

2025 0 cit.

Model Theft

50%