tool 2026

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai 1, Parth Shah 2, Harshil Patel 1

0 citations · 34 references · arXiv (Cornell University)

α

Published on arXiv

2602.07391

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Evolutionary prompt mutation on Gemini 2.5 Flash systematically uncovers high-severity failure modes that one-shot adversarial methods miss, with ablations confirming that the synergy between corpus exploration and targeted mutation drives vulnerability discovery.

NAAMSE

Novel technique introduced


AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem. Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring. By using model responses as a fitness signal, the framework iteratively compounds effective attack strategies while simultaneously ensuring "benign-use correctness", preventing the degenerate security of blanket refusal. Our experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods, with controlled ablations revealing that the synergy between exploration and targeted mutation uncovers high-severity failure modes. We show that this adaptive approach provides a more realistic and scalable assessment of agent robustness in the face of evolving threats. The code for NAAMSE is open source and available at https://github.com/HASHIRU-AI/NAAMSE.


Key Contributions

  • Single-agent evolutionary framework that uses genetic prompt mutation and hierarchical corpus exploration to iteratively amplify LLM agent vulnerabilities as a feedback-driven optimization problem
  • Asymmetric behavioral scoring that simultaneously measures adversarial success and benign-use correctness, preventing false security from blanket-refusal models
  • Open-source implementation validated on Gemini 2.5 Flash, showing evolutionary mutation uncovers high-severity failure modes missed by one-shot static methods

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
custom corpus (128K adversarial + 50K benign queries from public benchmarks)Gemini 2.5 Flash (target model)
Applications
llm agentsai agent security evaluationautomated red-teaming