Poison to Detect: Detection of Targeted Overfitting in Federated Learning

Federated Learning (FL) enables collaborative model training across decentralised clients while keeping local data private, making it a widely adopted privacy-enhancing technology (PET). Despite its privacy benefits, FL remains vulnerable to privacy attacks, including those targeting specific clients. In this paper, we study an underexplored threat where a dishonest orchestrator intentionally manipulates the aggregation process to induce targeted overfitting in the local models of specific clients. Whereas many studies in this area predominantly focus on reducing the amount of information leakage during training, we focus on enabling an early client-side detection of targeted overfitting, thereby allowing clients to disengage before significant harm occurs. In line with this, we propose three detection techniques - (a) label flipping, (b) backdoor trigger injection, and (c) model fingerprinting - that enable clients to verify the integrity of the global aggregation. We evaluated our methods on multiple datasets under different attack scenarios. Our results show that the three methods reliably detect targeted overfitting induced by the orchestrator, but they differ in terms of computational complexity, detection latency, and false-positive rates.

Key Contributions

Characterizes a novel orchestrator-driven attack (targeted aggregation / targeted overfitting) where a dishonest FL server performs double aggregation to selectively overfit specific clients' local models
Proposes three client-side detection techniques — label flipping, backdoor trigger injection, and model fingerprinting — enabling clients to autonomously verify global aggregation integrity without inter-client cooperation
Evaluates the three detection methods across datasets and attack scenarios, showing trade-offs in computational complexity, detection latency, and false-positive rates

🛡️ Threat Analysis

Model Inversion Attack

The entire threat model is motivated by a malicious orchestrator causing targeted overfitting to enable data reconstruction and model inversion attacks on victim clients' training data. The paper's detection mechanisms defend against the upstream condition (targeted overfitting) that makes these reconstruction attacks feasible. The adversary test is met: the orchestrator is explicitly trying to facilitate private training data leakage.

Membership Inference Attack

Membership inference attack (MIA) is explicitly identified as the primary downstream privacy threat enabled by targeted overfitting — an overfitted model leaks information about whether specific data points were in training. The detection mechanisms directly aim to prevent this MIA exposure.