Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations

Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.

Key Contributions

Adversarial malware generation that simultaneously evades detection and minimizes drift signals
Similarity-constrained perturbation approach balancing evasion success with distributional similarity to genuine malware
Empirical analysis showing l2 regularization reduces output drift signals while maintaining attack effectiveness

🛡️ Threat Analysis

Input Manipulation Attack

Generates adversarial perturbations in feature space to cause misclassification of malware as benign at inference time — core adversarial evasion attack.

Model Skewing

Explicitly targets drift monitoring mechanisms in non-stationary environments, crafting adversarial samples that remain inconspicuous to drift detection — this is model skewing through drift exploitation.