defense 2025

A Safety and Security Framework for Real-World Agentic Systems

Shaona Ghosh ¹, Barnaby Simkin ¹, Kyriacos Shiarlis ², Soumili Nandi ¹, Dan Zhao ¹, Matthew Fiedler ², Julia Bazinska ², Nikki Pope ¹, Roopa Prabhu ¹, Daniel Rohrer , Michael Demoret ¹, Bartley Richardson ¹

¹ NVIDIA

² Lakera AI

2 citations · 30 references · arXiv

Published on arXiv

2511.21990

Excessive Agency

OWASP LLM Top 10 — LLM08

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Framework identifies and contextually mitigates novel agentic risks in NVIDIA's AI-Q Research Assistant, validated through over 10,000 realistic attack and defense execution traces

Dynamic Agentic Safety and Security Framework

Novel technique introduced

This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of individual models but also emergent properties arising from the dynamic interactions among models, orchestrators, tools, and data within their operating environments. We propose a new way of identification of novel agentic risks through the lens of user safety. Although, for traditional LLMs and agentic models in isolation, safety and security has a clear separation, through the lens of safety in agentic systems, they appear to be connected. Building on this foundation, we define an operational agentic risk taxonomy that unifies traditional safety and security concerns with novel, uniquely agentic risks, including tool misuse, cascading action chains, and unintended control amplification among others. At the core of our approach is a dynamic agentic safety and security framework that operationalizes contextual agentic risk management by using auxiliary AI models and agents, with human oversight, to assist in contextual risk discovery, evaluation, and mitigation. We further address one of the most challenging aspects of safety and security of agentic systems: risk discovery through sandboxed, AI-driven red teaming. We demonstrate the framework effectiveness through a detailed case study of NVIDIA flagship agentic research assistant, AI-Q Research Assistant, showcasing practical, end-to-end safety and security evaluations in complex, enterprise-grade agentic workflows. This risk discovery phase finds novel agentic risks that are then contextually mitigated. We also release the dataset from our case study, containing traces of over 10,000 realistic attack and defense executions of the agentic workflow to help advance research in agentic safety.

Key Contributions

Operational agentic risk taxonomy unifying traditional LLM safety/security concerns with novel agentic risks: tool misuse, cascading action chains, and unintended control amplification
Dynamic agentic safety and security framework using auxiliary AI agents with human oversight for contextual risk discovery, evaluation, and mitigation across the agentic development lifecycle
Sandboxed AI-driven red teaming methodology validated on NVIDIA's AI-Q Research Assistant, releasing a dataset of 10,000+ realistic attack and defense execution traces

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

Nemotron-AIQ-Agentic-Safety-Dataset-1.0

Applications

enterprise agentic ai systemsllm research assistants

Read PDF arXiv DOI Code

A Safety and Security Framework for Real-World Agentic Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Authenticated Workflows: A Systems Approach to Protecting Agentic AI

Systems Security Foundations for Agentic Computing

Evaluating Privilege Usage of Agents on Real-World Tools

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management