ASTRA: Agentic Steerability and Risk Assessment Framework

Securing AI agents powered by Large Language Models (LLMs) represents one of the most critical challenges in AI security today. Unlike traditional software, AI agents leverage LLMs as their "brain" to autonomously perform actions via connected tools. This capability introduces significant risks that go far beyond those of harmful text presented in a chatbot that was the main application of LLMs. A compromised AI agent can deliberately abuse powerful tools to perform malicious actions, in many cases irreversible, and limited solely by the guardrails on the tools themselves and the LLM ability to enforce them. This paper presents ASTRA, a first-of-its-kind framework designed to evaluate the effectiveness of LLMs in supporting the creation of secure agents that enforce custom guardrails defined at the system-prompt level (e.g., "Do not send an email out of the company domain," or "Never extend the robotic arm in more than 2 meters"). Our holistic framework simulates 10 diverse autonomous agents varying between a coding assistant and a delivery drone equipped with 37 unique tools. We test these agents against a suite of novel attacks developed specifically for agentic threats, inspired by the OWASP Top 10 but adapted to challenge the ability of the LLM for policy enforcement during multi-turn planning and execution of strict tool activation. By evaluating 13 open-source, tool-calling LLMs, we uncovered surprising and significant differences in their ability to remain secure and keep operating within their boundaries. The purpose of this work is to provide the community with a robust and unified methodology to build and validate better LLMs, ultimately pushing for more secure and reliable agentic AI systems.

Key Contributions

ASTRA: a first-of-its-kind evaluation framework simulating 10 diverse autonomous agents with 37 unique tools to assess LLM security in agentic settings
A suite of novel agentic attacks inspired by OWASP Top 10, adapted to challenge policy enforcement during multi-turn planning and strict tool activation
Comparative evaluation of 13 open-source tool-calling LLMs revealing significant and surprising differences in their ability to enforce custom guardrails

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

Custom ASTRA simulation environment (10 agents, 37 tools)

Applications

ai agentsagentic ai systemscoding assistantsautonomous delivery dronesrobotic systems

2026 0 cit.

100%