ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)
Nojan Sheybani 1, Alessandro Pegoraro 2, Jonathan Knauer 2, Phillip Rieger 2, Elissa Mollakuqe 2, Farinaz Koushanfar 1, Ahmad-Reza Sadeghi 2
Published on arXiv
2509.09787
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Reduces backdoor attack success rate to below 6% in Split Learning while incurring less than 10 seconds of overhead for models with up to 1 million client-side parameters.
ZORRO
Novel technique introduced
Split Learning (SL) is a distributed learning approach that enables resource-constrained clients to collaboratively train deep neural networks (DNNs) by offloading most layers to a central server while keeping in- and output layers on the client-side. This setup enables SL to leverage server computation capacities without sharing data, making it highly effective in resource-constrained environments dealing with sensitive data. However, the distributed nature enables malicious clients to manipulate the training process. By sending poisoned intermediate gradients, they can inject backdoors into the shared DNN. Existing defenses are limited by often focusing on server-side protection and introducing additional overhead for the server. A significant challenge for client-side defenses is enforcing malicious clients to correctly execute the defense algorithm. We present ZORRO, a private, verifiable, and robust SL defense scheme. Through our novel design and application of interactive zero-knowledge proofs (ZKPs), clients prove their correct execution of a client-located defense algorithm, resulting in proofs of computational integrity attesting to the benign nature of locally trained DNN portions. Leveraging the frequency representation of model partitions enables ZORRO to conduct an in-depth inspection of the locally trained models in an untrusted environment, ensuring that each client forwards a benign checkpoint to its succeeding client. In our extensive evaluation, covering different model architectures as well as various attack strategies and data scenarios, we show ZORRO's effectiveness, as it reduces the attack success rate to less than 6\% while causing even for models storing \numprint{1000000} parameters on the client-side an overhead of less than 10 seconds.
Key Contributions
- ZORRO: a Split Learning defense scheme combining DCT-based frequency-domain inspection of client model partitions with interactive zero-knowledge proofs (ZKPs) to verifiably enforce correct execution of the defense algorithm on untrusted clients.
- Novel use of ZKPs to produce proofs of computational integrity attesting to the benign nature of locally trained DNN portions, addressing the fundamental challenge of enforcing client-side defenses.
- Empirical evaluation showing attack success rate reduction to <6% across diverse model architectures and attack strategies, with overhead under 10 seconds for 1M-parameter client-side models.
🛡️ Threat Analysis
Paper directly defends against backdoor injection in Split Learning: malicious clients send poisoned intermediate gradients to embed hidden targeted behavior in the shared DNN. ZORRO uses DCT frequency analysis and ZKPs to verify clients correctly execute the defense, reducing attack success rate to below 6%. This is a canonical ML10 (backdoor/trojan defense) in a distributed learning setting.