defense 2025

H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning

Shiyuan Zuo 1, Rongfei Fan 2, Cheng Zhan 1, Jie Xu 3, Puning Zhao 4, Han Hu 1

0 citations · 34 references · arXiv

α

Published on arXiv

2509.24330

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

H+ achieves state-of-the-art Byzantine robustness across varying attack ratios and multiple attack types, while maintaining significantly lower computational complexity than existing similarity-aware aggregation methods.

H+

Novel technique introduced


Federated Learning (FL) enables decentralized model training without sharing raw data. However, it remains vulnerable to Byzantine attacks, which can compromise the aggregation of locally updated parameters at the central server. Similarity-aware aggregation has emerged as an effective strategy to mitigate such attacks by identifying and filtering out malicious clients based on similarity between client model parameters and those derived from clean data, i.e., data that is uncorrupted and trustworthy. However, existing methods adopt this strategy only in FL systems with clean data, making them inapplicable to settings where such data is unavailable. In this paper, we propose H+, a novel similarity-aware aggregation approach that not only outperforms existing methods in scenarios with clean data, but also extends applicability to FL systems without any clean data. Specifically, H+ randomly selects $r$-dimensional segments from the $p$-dimensional parameter vectors uploaded to the server and applies a similarity check function $H$ to compare each segment against a reference vector, preserving the most similar client vectors for aggregation. The reference vector is derived either from existing robust algorithms when clean data is unavailable or directly from clean data. Repeating this process $K$ times enables effective identification of honest clients. Moreover, H+ maintains low computational complexity, with an analytical time complexity of $\mathcal{O}(KMr)$, where $M$ is the number of clients and $Kr \ll p$. Comprehensive experiments validate H+ as a state-of-the-art (SOTA) method, demonstrating substantial robustness improvements over existing approaches under varying Byzantine attack ratios and multiple types of traditional Byzantine attacks, across all evaluated scenarios and benchmark datasets.


Key Contributions

  • H+: a similarity-aware aggregation method that filters Byzantine clients by comparing random r-dimensional parameter segments against a reference vector, repeated K times for robust client selection
  • Extension to FL settings without any clean data by deriving the reference vector from existing robust algorithms rather than requiring a trusted clean dataset
  • O(KMr) time complexity with Kr << p, making the defense substantially more computationally efficient than existing similarity-aware approaches

🛡️ Threat Analysis

Data Poisoning Attack

Byzantine attacks in FL involve malicious clients uploading corrupted/arbitrary model updates to degrade the global model — this is training-time data/gradient poisoning. H+ is a defense (robust aggregation) that identifies and filters malicious clients via similarity checks, directly countering this threat.


Details

Domains
federated-learning
Model Types
federated
Threat Tags
training_timeuntargeted
Datasets
benchmark datasets (unspecified in abstract)
Applications
federated learningdistributed model training