BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning

Federated learning (FL) has been widely adopted as a decentralized training paradigm that enables multiple clients to collaboratively learn a shared model without exposing their local data. As concerns over data privacy and regulatory compliance grow, machine unlearning, which aims to remove the influence of specific data from trained models, has become increasingly important in the federated setting to meet legal, ethical, or user-driven demands. However, integrating unlearning into FL introduces new challenges and raises largely unexplored security risks. In particular, adversaries may exploit the unlearning process to compromise the integrity of the global model. In this paper, we present the first backdoor attack in the context of federated unlearning, demonstrating that an adversary can inject backdoors into the global model through seemingly legitimate unlearning requests. Specifically, we propose BadFU, an attack strategy where a malicious client uses both backdoor and camouflage samples to train the global model normally during the federated training process. Once the client requests unlearning of the camouflage samples, the global model transitions into a backdoored state. Extensive experiments under various FL frameworks and unlearning strategies validate the effectiveness of BadFU, revealing a critical vulnerability in current federated unlearning practices and underscoring the urgent need for more secure and robust federated unlearning mechanisms.

Key Contributions

First backdoor attack exploiting the federated unlearning mechanism: a malicious client uses camouflage samples to neutralize the backdoor during training, then weaponizes a legitimate unlearning request to expose the backdoor in the global model.
Demonstrates that seemingly legitimate data deletion rights (right to be forgotten) can be abused as an attack surface in federated settings.
Validates BadFU under various FL frameworks and unlearning strategies, exposing a critical and previously unexplored vulnerability in current federated unlearning practices.

🛡️ Threat Analysis

Data Poisoning Attack

The attack requires a malicious client to inject crafted training data (backdoor samples + camouflage samples) into the federated training process — a form of data poisoning that corrupts the global model's training data, consistent with FL data injection attacks.

Model Poisoning

BadFU is fundamentally a backdoor attack: the malicious client embeds a hidden targeted behavior (trigger-activated misclassification) into the global federated model, which activates only after the camouflage samples are unlearned. This is a novel FL backdoor injection method exploiting the unlearning mechanism.