GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks

Federated learning (FL) enables privacy-preserving collaborative model training but remains vulnerable to adversarial behaviors that compromise model utility or fairness across sensitive groups. While extensive studies have examined attacks targeting either objective, strategies that simultaneously degrade both utility and fairness remain largely unexplored. To bridge this gap, we introduce the Dual-Facet Attack (DFA), a novel threat model that concurrently undermines predictive accuracy and group fairness. Two variants, Synchronous DFA (S-DFA) and Split DFA (Sp-DFA), are further proposed to capture distinct real-world collusion scenarios. Experimental results show that existing robust FL defenses, including hybrid aggregation schemes, fail to resist DFAs effectively. To counter these threats, we propose GuardFed, a self-adaptive defense framework that maintains a fairness-aware reference model using a small amount of clean server data augmented with synthetic samples. In each training round, GuardFed computes a dual-perspective trust score for every client by jointly evaluating its utility deviation and fairness degradation, thereby enabling selective aggregation of trustworthy updates. Extensive experiments on real-world datasets demonstrate that GuardFed consistently preserves both accuracy and fairness under diverse non-IID and adversarial conditions, achieving state-of-the-art performance compared with existing robust FL methods.

Key Contributions

Novel Dual-Facet Attack (DFA) threat model with two variants (S-DFA, Sp-DFA) that simultaneously degrades predictive accuracy and group fairness in federated learning
GuardFed defense framework using a fairness-aware reference model with synthetic data augmentation to compute dual-perspective trust scores per client
Empirical demonstration that existing robust FL aggregation schemes fail against DFA, motivating the new defense

🛡️ Threat Analysis

Data Poisoning Attack

The Dual-Facet Attack (DFA) operates via malicious FL clients sending adversarial model updates to degrade the global model — a Byzantine/poisoning attack during federated training. GuardFed defends against this by computing dual-perspective trust scores to filter malicious client updates before aggregation.

Details

Domains

federated-learning

Model Types

federated

Threat Tags

training_timeuntargeted

Applications

2025 0 cit.

Data Poisoning Attack

90%