SecureSplit: Mitigating Backdoor Attacks in Split Learning
Zhihao Dou 1, Dongfei Cui 2, Weida Wang 3, Anjun Gao 4, Yueyang Quan 5, Mengyao Ma 6, Viet Vo 7, Guangdong Bai 6, Zhuqing Liu 8,5, Minghong Fang 4
1 Case Western Reserve University
2 Northeast Electric Power University
6 The University of Queensland
Published on arXiv
2601.14054
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
SecureSplit effectively mitigates backdoor attacks across four datasets and five attack scenarios, outperforming seven alternative defenses under challenging conditions.
SecureSplit
Novel technique introduced
Split Learning (SL) offers a framework for collaborative model training that respects data privacy by allowing participants to share the same dataset while maintaining distinct feature sets. However, SL is susceptible to backdoor attacks, in which malicious clients subtly alter their embeddings to insert hidden triggers that compromise the final trained model. To address this vulnerability, we introduce SecureSplit, a defense mechanism tailored to SL. SecureSplit applies a dimensionality transformation strategy to accentuate subtle differences between benign and poisoned embeddings, facilitating their separation. With this enhanced distinction, we develop an adaptive filtering approach that uses a majority-based voting scheme to remove contaminated embeddings while preserving clean ones. Rigorous experiments across four datasets (CIFAR-10, MNIST, CINIC-10, and ImageNette), five backdoor attack scenarios, and seven alternative defenses confirm the effectiveness of SecureSplit under various challenging conditions.
Key Contributions
- Dimensionality transformation strategy that amplifies subtle differences between benign and backdoor-poisoned embeddings in Split Learning
- Adaptive filtering mechanism using a majority-based voting scheme to selectively remove contaminated embeddings while preserving clean ones
- Empirical evaluation across 4 datasets, 5 backdoor attack scenarios, and comparison against 7 alternative defenses
🛡️ Threat Analysis
SecureSplit directly defends against backdoor/trojan attacks in Split Learning, where malicious clients alter intermediate embeddings to embed hidden triggers that cause targeted misbehavior in the final trained model.