Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations
Published on arXiv
2604.21310
Input Manipulation Attack
OWASP ML Top 10 — ML01
Model Skewing
OWASP ML Top 10 — ML08
Key Finding
Similarity constraints reduce output drift signals, with l2 regularization showing most promising results in the evasion-detectability trade-off
Similarity-Constrained Adversarial Perturbations
Novel technique introduced
Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.
Key Contributions
- Adversarial malware generation that simultaneously evades detection and minimizes drift signals
- Similarity-constrained perturbation approach balancing evasion success with distributional similarity to genuine malware
- Empirical analysis showing l2 regularization reduces output drift signals while maintaining attack effectiveness
🛡️ Threat Analysis
Generates adversarial perturbations in feature space to cause misclassification of malware as benign at inference time — core adversarial evasion attack.
Explicitly targets drift monitoring mechanisms in non-stationary environments, crafting adversarial samples that remain inconspicuous to drift detection — this is model skewing through drift exploitation.