α

Published on arXiv

2601.14054

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

SecureSplit effectively mitigates backdoor attacks across four datasets and five attack scenarios, outperforming seven alternative defenses under challenging conditions.

SecureSplit

Novel technique introduced


Split Learning (SL) offers a framework for collaborative model training that respects data privacy by allowing participants to share the same dataset while maintaining distinct feature sets. However, SL is susceptible to backdoor attacks, in which malicious clients subtly alter their embeddings to insert hidden triggers that compromise the final trained model. To address this vulnerability, we introduce SecureSplit, a defense mechanism tailored to SL. SecureSplit applies a dimensionality transformation strategy to accentuate subtle differences between benign and poisoned embeddings, facilitating their separation. With this enhanced distinction, we develop an adaptive filtering approach that uses a majority-based voting scheme to remove contaminated embeddings while preserving clean ones. Rigorous experiments across four datasets (CIFAR-10, MNIST, CINIC-10, and ImageNette), five backdoor attack scenarios, and seven alternative defenses confirm the effectiveness of SecureSplit under various challenging conditions.


Key Contributions

  • Dimensionality transformation strategy that amplifies subtle differences between benign and backdoor-poisoned embeddings in Split Learning
  • Adaptive filtering mechanism using a majority-based voting scheme to selectively remove contaminated embeddings while preserving clean ones
  • Empirical evaluation across 4 datasets, 5 backdoor attack scenarios, and comparison against 7 alternative defenses

🛡️ Threat Analysis

Model Poisoning

SecureSplit directly defends against backdoor/trojan attacks in Split Learning, where malicious clients alter intermediate embeddings to embed hidden triggers that cause targeted misbehavior in the final trained model.


Details

Domains
visionfederated-learning
Model Types
cnn
Threat Tags
training_timetargetedgrey_box
Datasets
CIFAR-10MNISTCINIC-10ImageNette
Applications
collaborative model trainingsplit learning