NAAMSE: Framework for Evolutionary Security Evaluation of Agents
Kunal Pai 1, Parth Shah 2, Harshil Patel 1
Published on arXiv
2602.07391
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Evolutionary prompt mutation on Gemini 2.5 Flash systematically uncovers high-severity failure modes that one-shot adversarial methods miss, with ablations confirming that the synergy between corpus exploration and targeted mutation drives vulnerability discovery.
NAAMSE
Novel technique introduced
AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem. Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring. By using model responses as a fitness signal, the framework iteratively compounds effective attack strategies while simultaneously ensuring "benign-use correctness", preventing the degenerate security of blanket refusal. Our experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods, with controlled ablations revealing that the synergy between exploration and targeted mutation uncovers high-severity failure modes. We show that this adaptive approach provides a more realistic and scalable assessment of agent robustness in the face of evolving threats. The code for NAAMSE is open source and available at https://github.com/HASHIRU-AI/NAAMSE.
Key Contributions
- Single-agent evolutionary framework that uses genetic prompt mutation and hierarchical corpus exploration to iteratively amplify LLM agent vulnerabilities as a feedback-driven optimization problem
- Asymmetric behavioral scoring that simultaneously measures adversarial success and benign-use correctness, preventing false security from blanket-refusal models
- Open-source implementation validated on Gemini 2.5 Flash, showing evolutionary mutation uncovers high-severity failure modes missed by one-shot static methods