Rounding-Guided Backdoor Injection in Deep Learning Model Quantization

Model quantization is a popular technique for deploying deep learning models on resource-constrained environments. However, it may also introduce previously overlooked security risks. In this work, we present QuRA, a novel backdoor attack that exploits model quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs a novel weight selection strategy to identify critical weights that influence the backdoor target (with the goal of perserving the model's overall performance in mind). Then, by optimizing the rounding direction of these weights, we amplify the backdoor effect across model layers without degrading accuracy. Extensive experiments demonstrate that QuRA achieves nearly 100% attack success rates in most cases, with negligible performance degradation. Furthermore, we show that QuRA can adapt to bypass existing backdoor defenses, underscoring its threat potential. Our findings highlight critical vulnerability in widely used model quantization process, emphasizing the need for more robust security measures. Our implementation is available at https://github.com/cxx122/QuRA.

Key Contributions

Novel backdoor attack (QuRA) that injects malicious behavior exclusively through quantization rounding direction optimization, requiring no access to training data or model retraining
Weight selection strategy that identifies critical weights influencing the backdoor target while preserving clean accuracy
Demonstrated adaptability to bypass existing backdoor defenses, achieving nearly 100% ASR with negligible clean accuracy degradation

🛡️ Threat Analysis

Model Poisoning

QuRA is a backdoor injection attack that embeds hidden, trigger-activated malicious behavior into models by optimizing the rounding direction of critical weights during quantization. The model behaves normally on clean inputs and activates the backdoor only on trigger inputs — a textbook ML10 attack. While the paper motivates the work with a supply chain threat model (malicious quantization tools on HuggingFace), the primary contribution is the backdoor injection technique itself (weight rounding optimization), not a supply chain compromise method, so ML10 alone is appropriate per the stated guidelines.

Details

Domains

visionnlp

Model Types

cnntransformer

Threat Tags

white_boxtraining_timetargeteddigital

Datasets

CIFAR-10CIFAR-100ImageNetGTSRB

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models

The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models

Hardware-Triggered Backdoors

Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Trojans in Artificial Intelligence (TrojAI) Final Report

Towards Backdoor Stealthiness in Model Parameter Space

Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods