attack 2026

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath , Joshua Zhao , Saurabh Bagchi

0 citations

α

Published on arXiv

2603.29328

Model Poisoning

OWASP ML Top 10 — ML10

Data Poisoning Attack

OWASP ML Top 10 — ML02

Key Finding

Achieves high targeted attack success rates with semantically plausible triggers while evading robust aggregation defenses and preserving benign test accuracy

SABLE

Novel technique introduced


Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.


Key Contributions

  • SABLE attack using semantically meaningful, in-distribution triggers (e.g., semantic attribute changes) instead of synthetic corner patches
  • Aggregation-aware malicious objective with feature separation and parameter regularization to evade robust FL aggregation defenses
  • Demonstrates high attack success rates across multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, FLAME) while maintaining benign test accuracy

🛡️ Threat Analysis

Data Poisoning Attack

The attack mechanism involves poisoning a subset of local training data in malicious FL clients to embed the backdoor, qualifying as data poisoning in the federated learning context.

Model Poisoning

Proposes SABLE, a backdoor attack embedding hidden malicious behavior via semantic triggers (attribute changes like hair color, sunglasses) that activate only with specific inputs while preserving benign accuracy.


Details

Domains
visionfederated-learning
Model Types
cnnfederated
Threat Tags
training_timetargeted
Datasets
CelebAGTSRB
Applications
image classificationtraffic sign recognition