defense 2025

Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning

Lucas Fenaux , Zheng Wang , Jacob Yan , Nathan Chung , Florian Kerschbaum

0 citations

α

Published on arXiv

2509.08089

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Krum+ successfully defends against a novel adaptive attacker that breaks all prior SOTA defenses with just 1-2 malicious clients out of 20, by design covering both large- and small-magnitude backdoor update regimes.

Krum+ (Hammer and Anvil / CSFT)

Novel technique introduced


Federated Learning is a distributed learning technique in which multiple clients cooperate to train a machine learning model. Distributed settings facilitate backdoor attacks by malicious clients, who can embed malicious behaviors into the model during their participation in the training process. These malicious behaviors are activated during inference by a specific trigger. No defense against backdoor attacks has stood the test of time, especially against adaptive attackers, a powerful but not fully explored category of attackers. In this work, we first devise a new adaptive adversary that surpasses existing adversaries in capabilities, yielding attacks that only require one or two malicious clients out of 20 to break existing state-of-the-art defenses. Then, we present Hammer and Anvil, a principled defense approach that combines two defenses orthogonal in their underlying principle to produce a combined defense that, given the right set of parameters, must succeed against any attack. We show that our best combined defense, Krum+, is successful against our new adaptive adversary and state-of-the-art attacks.


Key Contributions

  • Novel adaptive adversary for FL backdoor attacks requiring only 1-2 malicious clients out of 20 to break existing SOTA defenses
  • Clipped Super Fine-Tuning (CSFT), a federated-setting variant of super-fine-tuning that removes weakly-inserted (small-magnitude) backdoors
  • Hammer and Anvil framework combining robust aggregation (Krum, against large-magnitude updates) with CSFT (against small-magnitude updates), yielding Krum+ which empirically defeats all tested adaptive and SOTA attacks

🛡️ Threat Analysis

Model Poisoning

The paper directly addresses backdoor/trojan attacks in federated learning — malicious clients embed hidden behaviors activated by specific triggers. Both the novel adaptive attack and the Krum+ defense target trigger-based backdoor injection, the core of ML10. FL model poisoning with a backdoor goal belongs here, not ML02.


Details

Domains
federated-learningvision
Model Types
federatedcnn
Threat Tags
white_boxtraining_timetargeted
Applications
federated learningimage classification