DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks

Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models against these attacks is little studied compared to large models. Previous defenses propose to inject calculated noise into the prediction probabilities. However, these defenses are limited since they are not incorporated during the model design and are only added as an afterthought after training. Additionally, most defense techniques are computationally expensive and often have unrealistic assumptions about the victim model that are not feasible in edge device implementations and do not apply to quantized models. In this paper, we propose DivQAT, a novel algorithm to train quantized CNNs based on Quantization Aware Training (QAT) aiming to enhance their robustness against extraction attacks. To the best of our knowledge, our technique is the first to modify the quantization process to integrate a model extraction defense into the training process. Through empirical validation on benchmark vision datasets, we demonstrate the efficacy of our technique in defending against model extraction attacks without compromising model accuracy. Furthermore, combining our quantization technique with other defense mechanisms improves their effectiveness compared to traditional QAT.

Key Contributions

DivQAT: first method to integrate a model extraction defense directly into the quantization-aware training process rather than as a post-hoc add-on
Demonstrates that DivQAT reduces surrogate model fidelity without degrading victim model accuracy on benchmark vision datasets
Shows DivQAT is composable with existing defense mechanisms, improving their effectiveness beyond traditional QAT baselines

🛡️ Threat Analysis

Model Theft

Directly defends against model extraction attacks — adversaries querying a quantized CNN to clone its functionality. DivQAT modifies the QAT training process to make surrogate models less faithful, protecting model IP.

Details

Domains

vision

Model Types

cnn

Threat Tags

black_boxinference_time

Datasets

CIFAR-10CIFAR-100

Applications

2026 0 cit.

Model Theft

80%