defense 2025

Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption

Sayan Biswas 1, Philippe Chartier 2,3,4, Akash Dhasade 1, Tom Jurien 1, David Kerriou 5, Anne-Marie Kerrmarec 1, Mohammed Lemou 3,4,6, Franklin Tranie 1, Martijn de Vos 1, Milos Vujasinovic 1

0 citations

α

Published on arXiv

2509.01253

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Safhire achieves 1.5×–10.5× lower inference latency than Orion on ResNet-20/18/34 while provably protecting model confidentiality against clients observing intermediate layer outputs.

Safhire

Novel technique introduced


In contemporary cloud-based services, protecting users' sensitive data and ensuring the confidentiality of the server's model are critical. Fully homomorphic encryption (FHE) enables inference directly on encrypted inputs, but its practicality is hindered by expensive bootstrapping and inefficient approximations of non-linear activations. We introduce Safhire, a hybrid inference framework that executes linear layers under encryption on the server while offloading non-linearities to the client in plaintext. This design eliminates bootstrapping, supports exact activations, and significantly reduces computation. To safeguard model confidentiality despite client access to intermediate outputs, Safhire applies randomized shuffling, which obfuscates intermediate values and makes it practically impossible to reconstruct the model. To further reduce latency, Safhire incorporates advanced optimizations such as fast ciphertext packing and partial extraction. Evaluations on multiple standard models and datasets show that Safhire achieves 1.5X - 10.5X lower inference latency than Orion, a state-of-the-art baseline, with manageable communication overhead and comparable accuracy, thereby establishing the practicality of hybrid FHE inference.


Key Contributions

  • Hybrid FHE inference framework (Safhire) that eliminates bootstrapping and polynomial non-linear approximations by offloading activations to the client in plaintext while keeping linear layers encrypted on the server
  • Randomized shuffling mechanism with formal differential privacy guarantees to obfuscate intermediate outputs and prevent client-side model reconstruction
  • Fast ciphertext packing and partial extraction optimizations achieving 1.5×–10.5× latency reduction over the Orion state-of-the-art FHE baseline

🛡️ Threat Analysis

Model Theft

The paper's key security contribution is defending against model theft: in the hybrid scheme, the client decrypts intermediate layer outputs and could use them to reconstruct the server's proprietary model weights. Safhire's randomized server-side shuffling (with per-session secret seeds and differential privacy guarantees) is a direct defense against this model reconstruction/IP-theft attack, explicitly framed as protecting 'model confidentiality' from an adversarial client.


Details

Domains
vision
Model Types
cnn
Threat Tags
grey_boxinference_time
Datasets
CIFAR-10Tiny ImageNetImageNet
Applications
cloud ml inferenceimage classification