defense 2026

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

0 citations

Published on arXiv

2603.12089

Model Theft

OWASP ML Top 10 — ML05

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Achieves near-100% verification rates with high resilience against fine-tuning, pruning, and quantization removal attacks, and negligible (<1-2%) impact on primary task performance across language and vision-language models.

EmbTracker

Novel technique introduced

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).

Key Contributions

Server-side black-box watermarking framework for federated language models requiring no client cooperation, enabling ownership verification via API queries only
Per-client identity-specific watermark injection enabling individual client traceability — leaked models can be attributed to a specific culprit
Demonstrated resilience against watermark removal attacks (fine-tuning, pruning, quantization) with near-100% verification rates and <2% performance impact

🛡️ Threat Analysis

Model Theft

EmbTracker watermarks are embedded IN THE MODEL WEIGHTS distributed to each federated client, enabling proof of ownership and traceability when a client leaks the model — this is a model IP protection/theft defense, not content provenance. The watermark verifies which client's copy was stolen via black-box API queries.

Model Poisoning

EmbTracker's watermarking mechanism is explicitly backdoor-based — it injects a backdoor trigger pattern per client to serve as the watermark. The paper discusses resilience against backdoor removal attacks (fine-tuning, pruning, quantization), directly engaging ML10 threat model. The backdoor here is the technical mechanism for ownership verification in a federated setting.

Details

Domains

nlpfederated-learningmultimodal

Model Types

llmvlmfederatedtransformer

Threat Tags

black_boxtraining_timewhite_box

Applications

federated language model ip protectionmodel leakage attributionfederated learning

Read PDF arXiv

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Plato's Form: Toward Backdoor Defense-as-a-Service for LLMs with Prototype Representations

ROKA: Robust Knowledge Unlearning against Adversaries

A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning