Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning

Federated Learning is a distributed learning technique in which multiple clients cooperate to train a machine learning model. Distributed settings facilitate backdoor attacks by malicious clients, who can embed malicious behaviors into the model during their participation in the training process. These malicious behaviors are activated during inference by a specific trigger. No defense against backdoor attacks has stood the test of time, especially against adaptive attackers, a powerful but not fully explored category of attackers. In this work, we first devise a new adaptive adversary that surpasses existing adversaries in capabilities, yielding attacks that only require one or two malicious clients out of 20 to break existing state-of-the-art defenses. Then, we present Hammer and Anvil, a principled defense approach that combines two defenses orthogonal in their underlying principle to produce a combined defense that, given the right set of parameters, must succeed against any attack. We show that our best combined defense, Krum+, is successful against our new adaptive adversary and state-of-the-art attacks.

Key Contributions

Novel adaptive adversary for FL backdoor attacks requiring only 1-2 malicious clients out of 20 to break existing SOTA defenses
Clipped Super Fine-Tuning (CSFT), a federated-setting variant of super-fine-tuning that removes weakly-inserted (small-magnitude) backdoors
Hammer and Anvil framework combining robust aggregation (Krum, against large-magnitude updates) with CSFT (against small-magnitude updates), yielding Krum+ which empirically defeats all tested adaptive and SOTA attacks

🛡️ Threat Analysis

Model Poisoning

The paper directly addresses backdoor/trojan attacks in federated learning — malicious clients embed hidden behaviors activated by specific triggers. Both the novel adaptive attack and the Krum+ defense target trigger-based backdoor injection, the core of ML10. FL model poisoning with a backdoor goal belongs here, not ML02.

Details

Domains

federated-learningvision

Model Types

federatedcnn

Threat Tags

white_boxtraining_timetargeted

Applications

2025 0 cit.

Model Poisoning

79%