DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks
Kacem Khaled , Felipe Gohring de Magalhães , Gabriela Nicolescu
Published on arXiv
2512.23948
Model Theft
OWASP ML Top 10 — ML05
Key Finding
DivQAT successfully degrades extracted surrogate model performance while maintaining victim model accuracy, and further improves effectiveness when combined with existing post-hoc defenses.
DivQAT
Novel technique introduced
Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models against these attacks is little studied compared to large models. Previous defenses propose to inject calculated noise into the prediction probabilities. However, these defenses are limited since they are not incorporated during the model design and are only added as an afterthought after training. Additionally, most defense techniques are computationally expensive and often have unrealistic assumptions about the victim model that are not feasible in edge device implementations and do not apply to quantized models. In this paper, we propose DivQAT, a novel algorithm to train quantized CNNs based on Quantization Aware Training (QAT) aiming to enhance their robustness against extraction attacks. To the best of our knowledge, our technique is the first to modify the quantization process to integrate a model extraction defense into the training process. Through empirical validation on benchmark vision datasets, we demonstrate the efficacy of our technique in defending against model extraction attacks without compromising model accuracy. Furthermore, combining our quantization technique with other defense mechanisms improves their effectiveness compared to traditional QAT.
Key Contributions
- DivQAT: first method to integrate a model extraction defense directly into the quantization-aware training process rather than as a post-hoc add-on
- Demonstrates that DivQAT reduces surrogate model fidelity without degrading victim model accuracy on benchmark vision datasets
- Shows DivQAT is composable with existing defense mechanisms, improving their effectiveness beyond traditional QAT baselines
🛡️ Threat Analysis
Directly defends against model extraction attacks — adversaries querying a quantized CNN to clone its functionality. DivQAT modifies the QAT training process to make surrogate models less faithful, protecting model IP.