Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?
Akira Ito , Takayuki Miura , Yosuke Todo
Published on arXiv
2510.06692
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Existing polynomial-time hard-label model extraction requires exponential queries for deeper networks due to persistent neurons; CrossLayer Extraction mitigates this by recovering equivalent model behavior through cross-layer neuron interaction analysis.
CrossLayer Extraction
Novel technique introduced
Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting these internal models through access to a DNN is conceptually similar to extracting a secret key via oracle access to a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored recently. ReLU-based DNNs are the most commonly and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are usually invisible, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available to the attacker. Notably, Carlini et al. (Eurocrypt 2025) demonstrated that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we first show that the assumptions underlying their attack become increasingly unrealistic as the attack-target depth grows. In practice, satisfying these assumptions requires an exponential number of queries with respect to the attack depth, implying that the attack does not always run in polynomial time. To address this critical limitation, we propose a novel attack method called CrossLayer Extraction. Instead of directly extracting the secret parameters (e.g., weights and biases) of a specific neuron, which incurs exponential cost, we exploit neuron interactions across layers to extract this information from deeper layers. This technique significantly reduces query complexity and mitigates the limitations of existing model extraction approaches.
Key Contributions
- Demonstrates that Carlini et al. (Eurocrypt 2025) polynomial-time hard-label extraction assumes unrealistic neuron activation distributions for deep networks, requiring exponential queries in practice due to 'persistent' (always-active) neurons
- Introduces CrossLayer Extraction, which exploits neuron interactions across layers rather than extracting per-neuron parameters directly, significantly reducing query complexity for deeper networks
- Provides formal correctness bounds: if there are n ε-persistent and m ε-dead neurons, the recovered model is correct with probability (1−ε)^(n+m)
🛡️ Threat Analysis
The paper both analyzes and improves upon cryptanalytic model extraction attacks — techniques for recovering secret model weights and biases from black-box query access to a ReLU DNN. CrossLayer Extraction is a novel method for stealing model parameters more efficiently, which is the core ML05 threat.