benchmark 2025

Quantization Blindspots: How Model Compression Breaks Backdoor Defenses

Rohan Pandey , Eric Ye

0 citations · 37 references · arXiv

α

Published on arXiv

2512.06243

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

INT8 post-training quantization reduces the detection rate of all five evaluated backdoor defenses to 0% while BadNet backdoors survive with attack success rates above 99%, exposing a critical gap between FP32-based defense evaluation and real-world quantized deployment.


Backdoor attacks embed input-dependent malicious behavior into neural networks while preserving high clean accuracy, making them a persistent threat for deployed ML systems. At the same time, real-world deployments almost never serve full-precision models: post-training quantization to INT8 or lower precision is now standard practice for reducing memory and latency. This work asks a simple question: how do existing backdoor defenses behave under standard quantization pipelines? We conduct a systematic empirical study of five representative defenses across three precision settings (FP32, INT8 dynamic, INT4 simulated) and two standard vision benchmarks using a canonical BadNet attack. We observe that INT8 quantization reduces the detection rate of all evaluated defenses to 0% while leaving attack success rates above 99%. For INT4, we find a pronounced dataset dependence: Neural Cleanse remains effective on GTSRB but fails on CIFAR-10, even though backdoors continue to survive quantization with attack success rates above 90%. Our results expose a mismatch between how defenses are commonly evaluated (on FP32 models) and how models are actually deployed (in quantized form), and they highlight quantization robustness as a necessary axis in future evaluations and designs of backdoor defenses.


Key Contributions

  • Evaluation protocol testing 5 backdoor defenses (Neural Cleanse, Activation Clustering, Spectral Signatures, STRIP, Fine-Pruning) across FP32, INT8-dynamic, and INT4-simulated precision settings on CIFAR-10 and GTSRB
  • Finding that INT8 quantization reduces aggregate defense detection rate from 20% to 0% while backdoors survive with >99% attack success rate
  • Discovery of dataset-dependent INT4 behavior: Neural Cleanse remains effective on GTSRB but fails entirely on CIFAR-10, showing defense robustness properties do not transfer across domains

🛡️ Threat Analysis

Model Poisoning

Primary focus is evaluating five canonical backdoor defenses (Neural Cleanse, Activation Clustering, Spectral Signatures, STRIP, Fine-Pruning) and showing they break down under standard quantization pipelines, while backdoors injected via BadNets survive quantization intact.


Details

Domains
vision
Model Types
cnn
Threat Tags
training_timedigital
Datasets
CIFAR-10GTSRB
Applications
image classification