Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks

Neural networks are valuable intellectual property due to the significant computational cost, expert labor, and proprietary data involved in their development. Consequently, protecting their parameters is critical not only for maintaining a competitive advantage but also for enhancing the model's security and privacy. Prior works have demonstrated the growing capability of cryptanalytic attacks to scale to deeper models. In this paper, we present the first defense mechanism against cryptanalytic parameter extraction attacks. Our key insight is to eliminate the neuron uniqueness necessary for these attacks to succeed. We achieve this by a novel, extraction-aware training method. Specifically, we augment the standard loss function with an additional regularization term that minimizes the distance between neuron weights within a layer. Therefore, the proposed defense has zero area-delay overhead during inference. We evaluate the effectiveness of our approach in mitigating extraction attacks while analyzing the model accuracy across different architectures and datasets. When re-trained with the same model architecture, the results show that our defense incurs a marginal accuracy change of less than 1% with the modified loss function. Moreover, we present a theoretical framework to quantify the success probability of the attack. When tested comprehensively with prior attack settings, our defense demonstrated empirical success for sustained periods of extraction, whereas unprotected networks are extracted between 14 minutes to 4 hours.

Key Contributions

First defense mechanism against cryptanalytic neural network parameter extraction attacks
Extraction-aware training that augments the loss with a regularization term minimizing inter-neuron weight distances within a layer to eliminate neuron uniqueness required by cryptanalytic attacks
Theoretical framework quantifying the success probability of cryptanalytic extraction, with empirical results showing <1% accuracy degradation while preventing extraction that otherwise completes in 14 minutes to 4 hours

🛡️ Threat Analysis

Model Theft

Cryptanalytic parameter extraction attacks target the model's weights/parameters directly — this is model theft. The paper proposes a training-based defense that eliminates the neuron uniqueness these attacks exploit, protecting model intellectual property from being reconstructed by an adversary querying the model.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxgrey_boxinference_timetraining_time

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach

Protecting Deep Neural Network Intellectual Property with Chaos-Based White-Box Watermarking

Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

Re-Key-Free, Risky-Free: Adaptable Model Usage Control