Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level vulnerabilities with real-world impact. This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system, performing a case study analysis of Gmail's pipeline where file-type identification relies on a ML model. The malware detection pipeline in use by Gmail contains a machine learning model that routes each potential malware sample to a specialized malware classifier to improve accuracy and performance. This model, called Magika, has been open sourced. By designing adversarial examples that fool Magika, we can cause the production malware service to incorrectly route malware to an unsuitable malware detector thereby increasing our chance of evading detection. Specifically, by changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases and thereby allow us to send malware files over Gmail. We then turn our attention to defenses, and develop an approach to mitigate the severity of these types of attacks. For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate. We implement this defense, and, thanks to a collaboration with Google engineers, it has already been deployed in production for the Gmail classifier.

Key Contributions

Demonstrates that changing just 13 bytes of a malware sample evades Magika with 90% success, allowing malware to bypass Gmail's entire malware detection pipeline via misrouting
Shows adversarial transferability from the open-source Magika model to the production Gmail classifier despite potential model differences
Proposes and collaboratively deploys a production defense with Google that limits adversarial success to 20% even when an attacker is permitted 50 bytes of modification

🛡️ Threat Analysis

Input Manipulation Attack

Crafts minimal adversarial byte-level perturbations that fool Magika (an open-source ML classifier) at inference time, causing misrouting in Gmail's production malware detection pipeline; also develops and deploys an adversarial training defense that reduces attack success rate to 20% even with 50-byte modifications.

Details

Domains

nlp

Model Types

transformer

Threat Tags

white_boxblack_boxinference_timetargeteddigital

Applications

2025 0 cit.

Input Manipulation Attack

85%

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

Semantics-Preserving Evasion of LLM Vulnerability Detectors

destroR: Attacking Transfer Models with Obfuscous Examples to Discard Perplexity

HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors

StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

Rerouting LLM Routers

Adversarial Attacks against Neural Ranking Models via In-Context Learning

Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm