defense 2025

Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption

Sayan Biswas ¹, Philippe Chartier ^2,3,4, Akash Dhasade ¹, Tom Jurien ¹, David Kerriou ⁵, Anne-Marie Kerrmarec ¹, Mohammed Lemou ^3,4,6, Franklin Tranie ¹, Martijn de Vos ¹, Milos Vujasinovic ¹

¹ EPFL

² Inria

³ IRMAR

⁴ Université de Rennes

⁵ École Polytechnique

⁶ CNRS

0 citations

Published on arXiv

2509.01253

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Safhire achieves 1.5×–10.5× lower inference latency than Orion on ResNet-20/18/34 while provably protecting model confidentiality against clients observing intermediate layer outputs.

Safhire

Novel technique introduced

In contemporary cloud-based services, protecting users' sensitive data and ensuring the confidentiality of the server's model are critical. Fully homomorphic encryption (FHE) enables inference directly on encrypted inputs, but its practicality is hindered by expensive bootstrapping and inefficient approximations of non-linear activations. We introduce Safhire, a hybrid inference framework that executes linear layers under encryption on the server while offloading non-linearities to the client in plaintext. This design eliminates bootstrapping, supports exact activations, and significantly reduces computation. To safeguard model confidentiality despite client access to intermediate outputs, Safhire applies randomized shuffling, which obfuscates intermediate values and makes it practically impossible to reconstruct the model. To further reduce latency, Safhire incorporates advanced optimizations such as fast ciphertext packing and partial extraction. Evaluations on multiple standard models and datasets show that Safhire achieves 1.5X - 10.5X lower inference latency than Orion, a state-of-the-art baseline, with manageable communication overhead and comparable accuracy, thereby establishing the practicality of hybrid FHE inference.

Key Contributions

Hybrid FHE inference framework (Safhire) that eliminates bootstrapping and polynomial non-linear approximations by offloading activations to the client in plaintext while keeping linear layers encrypted on the server
Randomized shuffling mechanism with formal differential privacy guarantees to obfuscate intermediate outputs and prevent client-side model reconstruction
Fast ciphertext packing and partial extraction optimizations achieving 1.5×–10.5× latency reduction over the Orion state-of-the-art FHE baseline

🛡️ Threat Analysis

Model Theft

The paper's key security contribution is defending against model theft: in the hybrid scheme, the client decrypts intermediate layer outputs and could use them to reconstruct the server's proprietary model weights. Safhire's randomized server-side shuffling (with per-session secret seeds and differential privacy guarantees) is a direct defense against this model reconstruction/IP-theft attack, explicitly framed as protecting 'model confidentiality' from an adversarial client.

Details

Domains

vision

Model Types

cnn

Threat Tags

grey_boxinference_time

Datasets

CIFAR-10Tiny ImageNetImageNet

Applications

cloud ml inferenceimage classification

Read PDF arXiv

Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A PUF-Based Approach for Copy Protection of Intellectual Property in Neural Network Models

DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks

Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach

IrisFP: Adversarial-Example-based Model Fingerprinting with Enhanced Uniqueness and Robustness

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks