Certifiably robust malware detectors by design

Malware analysis involves analyzing suspicious software to detect malicious payloads. Static malware analysis, which does not require software execution, relies increasingly on machine learning techniques to achieve scalability. Although such techniques obtain very high detection accuracy, they can be easily evaded with adversarial examples where a few modifications of the sample can dupe the detector without modifying the behavior of the software. Unlike other domains, such as computer vision, creating an adversarial example of malware without altering its functionality requires specific transformations. We propose a new model architecture for certifiably robust malware detection by design. In addition, we show that every robust detector can be decomposed into a specific structure, which can be applied to learn empirically robust malware detectors, even on fragile features. Our framework ERDALT is based on this structure. We compare and validate these approaches with machine-learning-based malware detection methods, allowing for robust detection with limited reduction of detection performance.

Key Contributions

New model architecture for certifiably robust malware detection by design, leveraging the constraint that malware-preserving transformations are limited
Decomposition theorem showing every robust detector can be expressed in a specific structural form, enabling empirical robustness even on fragile features
ERDALT framework instantiating this structure for empirically robust ML-based static malware detection with limited accuracy trade-off

🛡️ Threat Analysis

Input Manipulation Attack

The paper defends against adversarial examples (evasion attacks) crafted to fool ML-based static malware detectors at inference time without altering malware functionality. The primary contributions — a certifiably robust model architecture by design and the ERDALT empirical robustness framework — are certified robustness defenses explicitly targeting input manipulation attacks.

Details

Model Types

traditional_ml

Threat Tags

white_boxinference_timedigital

Applications

2026 0 cit.

Input Manipulation Attack

70%