tool 2026

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai ¹, Parth Shah ², Harshil Patel ¹

¹ University of California, Davis

² Independent Researcher

0 citations · 34 references · arXiv (Cornell University)

Published on arXiv

2602.07391

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Evolutionary prompt mutation on Gemini 2.5 Flash systematically uncovers high-severity failure modes that one-shot adversarial methods miss, with ablations confirming that the synergy between corpus exploration and targeted mutation drives vulnerability discovery.

NAAMSE

Novel technique introduced

AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem. Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring. By using model responses as a fitness signal, the framework iteratively compounds effective attack strategies while simultaneously ensuring "benign-use correctness", preventing the degenerate security of blanket refusal. Our experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods, with controlled ablations revealing that the synergy between exploration and targeted mutation uncovers high-severity failure modes. We show that this adaptive approach provides a more realistic and scalable assessment of agent robustness in the face of evolving threats. The code for NAAMSE is open source and available at https://github.com/HASHIRU-AI/NAAMSE.

Key Contributions

Single-agent evolutionary framework that uses genetic prompt mutation and hierarchical corpus exploration to iteratively amplify LLM agent vulnerabilities as a feedback-driven optimization problem
Asymmetric behavioral scoring that simultaneously measures adversarial success and benign-use correctness, preventing false security from blanket-refusal models
Open-source implementation validated on Gemini 2.5 Flash, showing evolutionary mutation uncovers high-severity failure modes missed by one-shot static methods

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeteddigital

Datasets

custom corpus (128K adversarial + 50K benign queries from public benchmarks)Gemini 2.5 Flash (target model)

Applications

llm agentsai agent security evaluationautomated red-teaming

Read PDF arXiv DOI Code

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models

RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning

SGuard-v1: Safety Guardrail for Large Language Models

NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

Proactive Hardening of LLM Defenses with HASTE