defense 2025

Poison to Detect: Detection of Targeted Overfitting in Federated Learning

Soumia Zohra El Mestari 1, Maciej Krzysztof Zuziak 2, Gabriele Lenzini 1

0 citations

α

Published on arXiv

2509.11974

Model Inversion Attack

OWASP ML Top 10 — ML03

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Three client-side detection techniques reliably detect orchestrator-induced targeted overfitting in federated learning, with trade-offs between computational complexity, detection latency, and false-positive rates.

Targeted Overfitting Detection (label flipping, backdoor trigger injection, model fingerprinting)

Novel technique introduced


Federated Learning (FL) enables collaborative model training across decentralised clients while keeping local data private, making it a widely adopted privacy-enhancing technology (PET). Despite its privacy benefits, FL remains vulnerable to privacy attacks, including those targeting specific clients. In this paper, we study an underexplored threat where a dishonest orchestrator intentionally manipulates the aggregation process to induce targeted overfitting in the local models of specific clients. Whereas many studies in this area predominantly focus on reducing the amount of information leakage during training, we focus on enabling an early client-side detection of targeted overfitting, thereby allowing clients to disengage before significant harm occurs. In line with this, we propose three detection techniques - (a) label flipping, (b) backdoor trigger injection, and (c) model fingerprinting - that enable clients to verify the integrity of the global aggregation. We evaluated our methods on multiple datasets under different attack scenarios. Our results show that the three methods reliably detect targeted overfitting induced by the orchestrator, but they differ in terms of computational complexity, detection latency, and false-positive rates.


Key Contributions

  • Characterizes a novel orchestrator-driven attack (targeted aggregation / targeted overfitting) where a dishonest FL server performs double aggregation to selectively overfit specific clients' local models
  • Proposes three client-side detection techniques — label flipping, backdoor trigger injection, and model fingerprinting — enabling clients to autonomously verify global aggregation integrity without inter-client cooperation
  • Evaluates the three detection methods across datasets and attack scenarios, showing trade-offs in computational complexity, detection latency, and false-positive rates

🛡️ Threat Analysis

Model Inversion Attack

The entire threat model is motivated by a malicious orchestrator causing targeted overfitting to enable data reconstruction and model inversion attacks on victim clients' training data. The paper's detection mechanisms defend against the upstream condition (targeted overfitting) that makes these reconstruction attacks feasible. The adversary test is met: the orchestrator is explicitly trying to facilitate private training data leakage.

Membership Inference Attack

Membership inference attack (MIA) is explicitly identified as the primary downstream privacy threat enabled by targeted overfitting — an overfitted model leaks information about whether specific data points were in training. The detection mechanisms directly aim to prevent this MIA exposure.


Details

Domains
federated-learning
Model Types
federated
Threat Tags
training_timetargeted
Applications
federated learning systems