defense 2025

DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks

Kacem Khaled , Felipe Gohring de Magalhães , Gabriela Nicolescu

0 citations · 49 references · arXiv

α

Published on arXiv

2512.23948

Model Theft

OWASP ML Top 10 — ML05

Key Finding

DivQAT successfully degrades extracted surrogate model performance while maintaining victim model accuracy, and further improves effectiveness when combined with existing post-hoc defenses.

DivQAT

Novel technique introduced


Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models against these attacks is little studied compared to large models. Previous defenses propose to inject calculated noise into the prediction probabilities. However, these defenses are limited since they are not incorporated during the model design and are only added as an afterthought after training. Additionally, most defense techniques are computationally expensive and often have unrealistic assumptions about the victim model that are not feasible in edge device implementations and do not apply to quantized models. In this paper, we propose DivQAT, a novel algorithm to train quantized CNNs based on Quantization Aware Training (QAT) aiming to enhance their robustness against extraction attacks. To the best of our knowledge, our technique is the first to modify the quantization process to integrate a model extraction defense into the training process. Through empirical validation on benchmark vision datasets, we demonstrate the efficacy of our technique in defending against model extraction attacks without compromising model accuracy. Furthermore, combining our quantization technique with other defense mechanisms improves their effectiveness compared to traditional QAT.


Key Contributions

  • DivQAT: first method to integrate a model extraction defense directly into the quantization-aware training process rather than as a post-hoc add-on
  • Demonstrates that DivQAT reduces surrogate model fidelity without degrading victim model accuracy on benchmark vision datasets
  • Shows DivQAT is composable with existing defense mechanisms, improving their effectiveness beyond traditional QAT baselines

🛡️ Threat Analysis

Model Theft

Directly defends against model extraction attacks — adversaries querying a quantized CNN to clone its functionality. DivQAT modifies the QAT training process to make surrogate models less faithful, protecting model IP.


Details

Domains
vision
Model Types
cnn
Threat Tags
black_boxinference_time
Datasets
CIFAR-10CIFAR-100
Applications
image classificationedge device deployment