defense 2025

Lightweight and Robust Federated Data Valuation

Guojun Tang 1, Jiayu Zhou 2, Mohammad Mamun 3, Steve Drew 1

0 citations · 41 references · ICDMW

α

Published on arXiv

2509.25560

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

FedIF matches or exceeds Shapley-value-based robustness against adversarial FL clients while reducing aggregation overhead by up to 450x on CIFAR-10 and Fashion-MNIST.

FedIF

Novel technique introduced


Federated learning (FL) faces persistent robustness challenges due to non-IID data distributions and adversarial client behavior. A promising mitigation strategy is contribution evaluation, which enables adaptive aggregation by quantifying each client's utility to the global model. However, state-of-the-art Shapley-value-based approaches incur high computational overhead due to repeated model reweighting and inference, which limits their scalability. We propose FedIF, a novel FL aggregation framework that leverages trajectory-based influence estimation to efficiently compute client contributions. FedIF adapts decentralized FL by introducing normalized and smoothed influence scores computed from lightweight gradient operations on client updates and a public validation set. Theoretical analysis demonstrates that FedIF yields a tighter bound on one-step global loss change under noisy conditions. Extensive experiments on CIFAR-10 and Fashion-MNIST show that FedIF achieves robustness comparable to or exceeding SV-based methods in the presence of label noise, gradient noise, and adversarial samples, while reducing aggregation overhead by up to 450x. Ablation studies confirm the effectiveness of FedIF's design choices, including local weight normalization and influence smoothing. Our results establish FedIF as a practical, theoretically grounded, and scalable alternative to Shapley-value-based approaches for efficient and robust FL in real-world deployments.


Key Contributions

  • FedIF: a trajectory-based influence estimation framework for FL aggregation that replaces expensive Shapley-value reweighting with lightweight gradient-based client contribution scores
  • Normalized and smoothed influence scores that provide a tighter theoretical bound on one-step global loss change under noisy/adversarial conditions
  • Up to 450x reduction in aggregation overhead vs. SV-based methods while matching or exceeding robustness against label noise, gradient noise, and adversarial samples

🛡️ Threat Analysis

Data Poisoning Attack

FedIF defends against Byzantine/adversarial clients in federated learning — including label flipping (data poisoning) and gradient attacks — by computing per-client influence scores to adaptively downweight malicious participants during aggregation. This is a robust aggregation defense against training-time data and gradient poisoning in FL.


Details

Domains
federated-learningvision
Model Types
federatedcnn
Threat Tags
training_timegrey_box
Datasets
CIFAR-10Fashion-MNIST
Applications
federated learningrobust model aggregation