Certifiably robust malware detectors by design
Pierre-Francois Gimenez 1,2,3,4, Sarath Sivaprasad 5, Mario Fritz 5
Published on arXiv
2508.10038
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Certifiably and empirically robust malware detectors are achieved with only limited reduction in detection performance compared to non-robust baselines.
ERDALT
Novel technique introduced
Malware analysis involves analyzing suspicious software to detect malicious payloads. Static malware analysis, which does not require software execution, relies increasingly on machine learning techniques to achieve scalability. Although such techniques obtain very high detection accuracy, they can be easily evaded with adversarial examples where a few modifications of the sample can dupe the detector without modifying the behavior of the software. Unlike other domains, such as computer vision, creating an adversarial example of malware without altering its functionality requires specific transformations. We propose a new model architecture for certifiably robust malware detection by design. In addition, we show that every robust detector can be decomposed into a specific structure, which can be applied to learn empirically robust malware detectors, even on fragile features. Our framework ERDALT is based on this structure. We compare and validate these approaches with machine-learning-based malware detection methods, allowing for robust detection with limited reduction of detection performance.
Key Contributions
- New model architecture for certifiably robust malware detection by design, leveraging the constraint that malware-preserving transformations are limited
- Decomposition theorem showing every robust detector can be expressed in a specific structural form, enabling empirical robustness even on fragile features
- ERDALT framework instantiating this structure for empirically robust ML-based static malware detection with limited accuracy trade-off
🛡️ Threat Analysis
The paper defends against adversarial examples (evasion attacks) crafted to fool ML-based static malware detectors at inference time without altering malware functionality. The primary contributions — a certifiably robust model architecture by design and the ERDALT empirical robustness framework — are certified robustness defenses explicitly targeting input manipulation attacks.