LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors

The rapid growth in both the scale and complexity of Android malware has driven the widespread adoption of machine learning (ML) techniques for scalable and accurate malware detection. Despite their effectiveness, these models remain vulnerable to adversarial attacks that introduce carefully crafted feature-level perturbations to evade detection while preserving malicious functionality. In this paper, we present LAMLAD, a novel adversarial attack framework that exploits the generative and reasoning capabilities of large language models (LLMs) to bypass ML-based Android malware classifiers. LAMLAD employs a dual-agent architecture composed of an LLM manipulator, which generates realistic and functionality-preserving feature perturbations, and an LLM analyzer, which guides the perturbation process toward successful evasion. To improve efficiency and contextual awareness, LAMLAD integrates retrieval-augmented generation (RAG) into the LLM pipeline. Focusing on Drebin-style feature representations, LAMLAD enables stealthy and high-confidence attacks against widely deployed Android malware detection systems. We evaluate LAMLAD against three representative ML-based Android malware detectors and compare its performance with two state-of-the-art adversarial attack methods. Experimental results demonstrate that LAMLAD achieves an attack success rate (ASR) of up to 97%, requiring on average only three attempts per adversarial sample, highlighting its effectiveness, efficiency, and adaptability in practical adversarial settings. Furthermore, we propose an adversarial training-based defense strategy that reduces the ASR by more than 30% on average, significantly enhancing model robustness against LAMLAD-style attacks.

Key Contributions

LAMLAD: a dual-agent LLM framework (manipulator + analyzer) that iteratively generates functionality-preserving adversarial feature perturbations to evade Android malware detectors
Integration of RAG into the adversarial LLM pipeline to improve efficiency and contextual awareness of feature semantics
Adversarial training defense that reduces LAMLAD's attack success rate by over 30% on average

🛡️ Threat Analysis

Input Manipulation Attack

LAMLAD crafts adversarial feature-level perturbations (Drebin-style features) at inference time to cause ML-based malware classifiers to misclassify malicious APKs as benign — a direct adversarial evasion attack on ML models.

Details

Domains

tabularnlp

Model Types

llmtraditional_ml

Threat Tags

black_boxinference_timetargeteddigital

Datasets

Drebin

Applications

2026 0 cit.

Input Manipulation Attack

69%