defense 2025

Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection

Zhengchunmin Dai 1, Jiaxiong Tang 1, Peng Sun 2, Honglong Chen 3, Liantao Wu 1

0 citations · 43 references · arXiv

α

Published on arXiv

2511.14422

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Extensive experiments demonstrate Sigil achieves fidelity, robustness, and stealthiness against gradient anomaly detection and a specifically designed adaptive subspace removal attack across multiple datasets and models.

Sigil

Novel technique introduced


In decentralized machine learning paradigms such as Split Federated Learning (SFL) and its variant U-shaped SFL, the server's capabilities are severely restricted. Although this enhances client-side privacy, it also leaves the server highly vulnerable to model theft by malicious clients. Ensuring intellectual property protection for such capability-limited servers presents a dual challenge: watermarking schemes that depend on client cooperation are unreliable in adversarial settings, whereas traditional server-side watermarking schemes are technically infeasible because the server lacks access to critical elements such as model parameters or labels. To address this challenge, this paper proposes Sigil, a mandatory watermarking framework designed specifically for capability-limited servers. Sigil defines the watermark as a statistical constraint on the server-visible activation space and embeds the watermark into the client model via gradient injection, without requiring any knowledge of the data. Besides, we design an adaptive gradient clipping mechanism to ensure that our watermarking process remains both mandatory and stealthy, effectively countering existing gradient anomaly detection methods and a specifically designed adaptive subspace removal attack. Extensive experiments on multiple datasets and models demonstrate Sigil's fidelity, robustness, and stealthiness.


Key Contributions

  • Sigil: a mandatory server-enforced watermarking framework for U-shaped SFL that embeds a watermark into the client model via gradient injection without requiring access to client data, labels, or model parameters
  • Watermark defined as a statistical constraint on server-visible activations, enabling black-box ownership verification against a capability-limited server threat model
  • Adaptive gradient clipping mechanism to keep watermark injection stealthy against gradient anomaly detectors and an adaptive subspace removal attack

🛡️ Threat Analysis

Model Theft

Sigil is a model ownership watermarking defense: the server embeds a verifiable statistical constraint into the client model's activation space via gradient injection during training, enabling ownership verification if a malicious client steals the model. The watermark is in the MODEL to prove IP ownership — the canonical ML05 scenario — not in content outputs.


Details

Domains
visionfederated-learning
Model Types
cnnfederated
Threat Tags
training_timewhite_boxtargeted
Applications
split federated learningmodel ip protectionimage classification