defense 2025

Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks

Ashley Kurian , Aydin Aysu

0 citations

α

Published on arXiv

2509.16546

Model Theft

OWASP ML Top 10 — ML05

Key Finding

Defense incurs less than 1% accuracy change while blocking cryptanalytic parameter extraction that succeeds within 14 minutes to 4 hours on unprotected networks

Extraction-Aware Training

Novel technique introduced


Neural networks are valuable intellectual property due to the significant computational cost, expert labor, and proprietary data involved in their development. Consequently, protecting their parameters is critical not only for maintaining a competitive advantage but also for enhancing the model's security and privacy. Prior works have demonstrated the growing capability of cryptanalytic attacks to scale to deeper models. In this paper, we present the first defense mechanism against cryptanalytic parameter extraction attacks. Our key insight is to eliminate the neuron uniqueness necessary for these attacks to succeed. We achieve this by a novel, extraction-aware training method. Specifically, we augment the standard loss function with an additional regularization term that minimizes the distance between neuron weights within a layer. Therefore, the proposed defense has zero area-delay overhead during inference. We evaluate the effectiveness of our approach in mitigating extraction attacks while analyzing the model accuracy across different architectures and datasets. When re-trained with the same model architecture, the results show that our defense incurs a marginal accuracy change of less than 1% with the modified loss function. Moreover, we present a theoretical framework to quantify the success probability of the attack. When tested comprehensively with prior attack settings, our defense demonstrated empirical success for sustained periods of extraction, whereas unprotected networks are extracted between 14 minutes to 4 hours.


Key Contributions

  • First defense mechanism against cryptanalytic neural network parameter extraction attacks
  • Extraction-aware training that augments the loss with a regularization term minimizing inter-neuron weight distances within a layer to eliminate neuron uniqueness required by cryptanalytic attacks
  • Theoretical framework quantifying the success probability of cryptanalytic extraction, with empirical results showing <1% accuracy degradation while preventing extraction that otherwise completes in 14 minutes to 4 hours

🛡️ Threat Analysis

Model Theft

Cryptanalytic parameter extraction attacks target the model's weights/parameters directly — this is model theft. The paper proposes a training-based defense that eliminates the neuron uniqueness these attacks exploit, protecting model intellectual property from being reconstructed by an adversary querying the model.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxgrey_boxinference_timetraining_time
Applications
neural network ip protectionmodel parameter security