Collaborative Threshold Watermarking

In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as $K$ grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce $(t,K)$-threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least $t$ clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key $τ$ so that coalitions of fewer than $t$ clients cannot reconstruct it, and verification can be performed without revealing $τ$ in the clear. We instantiate our protocol in the white-box setting and evaluate on image classification. Our watermark remains detectable at scale ($K=128$) with minimal accuracy loss and stays above the detection threshold ($z\ge 4$) under attacks including adaptive fine-tuning using up to 20% of the training data.

Key Contributions

First (t,K)-threshold watermarking protocol for FL combining Shamir secret sharing with secure aggregation so only coalitions of ≥t clients can verify ownership
Verification scheme that computes a calibrated z-score test statistic directly from secret shares without reconstructing the key τ in the clear
Demonstrated scalability to K=128 clients with minimal accuracy loss and robustness to adaptive fine-tuning attacks using up to 20% of training data

🛡️ Threat Analysis

Model Theft

Watermark is embedded in model WEIGHTS to prove ownership and provenance of a jointly trained FL model — classic model IP protection. The scheme defends against unauthorized redistribution by ensuring only a coalition of ≥t clients can verify or reconstruct the watermark key, directly addressing model theft in the federated setting.

Details

Domains

visionfederated-learning

Model Types

cnnfederated

Threat Tags

white_boxtraining_time

Datasets

CIFAR-10CIFAR-100Tiny ImageNet

Applications

2026 0 cit.

Model Inversion AttackModel Theft

69%

Collaborative Threshold Watermarking

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection

Robust Client-Server Watermarking for Split Federated Learning

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

FLClear: Visually Verifiable Multi-Client Watermarking for Federated Learning

Protecting Deep Neural Network Intellectual Property with Chaos-Based White-Box Watermarking

BlackCATT: Black-box Collusion Aware Traitor Tracing in Federated Learning

Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation

Client-Cooperative Split Learning