attack arXiv Oct 5, 2025 · Oct 2025
Xiangxiang Chen, Peixin Zhang, Jun Sun et al. · Zhejiang University · Singapore Management University
QuRA injects backdoors into neural networks solely via quantization rounding direction manipulation, achieving ~100% ASR without training data access
Model Poisoning visionnlp
Model quantization is a popular technique for deploying deep learning models on resource-constrained environments. However, it may also introduce previously overlooked security risks. In this work, we present QuRA, a novel backdoor attack that exploits model quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs a novel weight selection strategy to identify critical weights that influence the backdoor target (with the goal of perserving the model's overall performance in mind). Then, by optimizing the rounding direction of these weights, we amplify the backdoor effect across model layers without degrading accuracy. Extensive experiments demonstrate that QuRA achieves nearly 100% attack success rates in most cases, with negligible performance degradation. Furthermore, we show that QuRA can adapt to bypass existing backdoor defenses, underscoring its threat potential. Our findings highlight critical vulnerability in widely used model quantization process, emphasizing the need for more robust security measures. Our implementation is available at https://github.com/cxx122/QuRA.
cnn transformer Zhejiang University · Singapore Management University
defense arXiv Nov 11, 2025 · Nov 2025
Ruihan Zhang, Jun Sun, Ee-Peng Lim et al. · Singapore Management University
Defends user data from unauthorized ML training by provably maximizing Bayes error via projected gradient ascent, robust to clean-data mixing
Data Poisoning Attack vision
The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.
cnn transformer Singapore Management University