Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code
Md. Abdul Awal , Mrigank Rochan , Chanchal K. Roy
Published on arXiv
2508.03949
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Compressed code language models consistently exhibit significantly reduced adversarial robustness compared to their uncompressed counterparts, with knowledge distillation causing the most severe degradation across all tasks and attacks.
Transformer-based language models for code have shown remarkable performance in various software analytics tasks, but their adoption is hindered by high computational costs, slow inference speeds, and substantial environmental impact. Model compression techniques such as pruning, quantization, and knowledge distillation have gained traction in addressing these challenges. However, the impact of these strategies on the robustness of compressed language models for code in adversarial scenarios remains poorly understood. Understanding how these compressed models behave under adversarial attacks is essential for their safe and effective deployment in real-world applications. To bridge this knowledge gap, we conduct a comprehensive evaluation of how common compression strategies affect the adversarial robustness of compressed models. We assess the robustness of compressed versions of three widely used language models for code across three software analytics tasks, using six evaluation metrics and four commonly used classical adversarial attacks. Our findings indicate that compressed models generally maintain comparable performance to their uncompressed counterparts. However, when subjected to adversarial attacks, compressed models exhibit significantly reduced robustness. These results reveal a trade-off between model size reduction and adversarial robustness, underscoring the need for careful consideration when deploying compressed models in security-critical software applications. Our study highlights the need for further research into compression strategies that strike a balance between computational efficiency and adversarial robustness, which is essential for deploying reliable language models for code in real-world software applications.
Key Contributions
- Comprehensive empirical evaluation of how pruning, quantization, and knowledge distillation affect the adversarial robustness of code language models (CodeBERT, CodeGPT, PLBART)
- Reveals a systematic trade-off: compressed models maintain near-original clean accuracy but suffer significantly greater performance degradation under adversarial attacks, with knowledge-distilled models showing the most pronounced vulnerability
- Evaluation spans three software analytics tasks (clone detection, code summarization, vulnerability detection) using six metrics and four established adversarial attacks
🛡️ Threat Analysis
The study evaluates four input manipulation attacks (ALERT, BeamAttack, MHM, WIR-Random) that craft adversarial code inputs causing misclassification or incorrect outputs at inference time, directly assessing input manipulation robustness of compressed models.