defense 2026

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti ^1,2

¹ Commure

² nFactor Technologies

0 citations

Published on arXiv

2603.17419

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

90-day deployment with automated security audit agent discovering four HIGH severity findings, progressive fleet hardening across three VM generations, and defense coverage mapped to all eleven attack patterns from recent red-teaming literature

Zero Trust Security Architecture for Autonomous AI Agents

Novel technique introduced

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.

Key Contributions

Six-domain threat model mapping agentic AI vulnerabilities to HIPAA Security Rule provisions for healthcare deployments
Four-layer defense-in-depth architecture with gVisor kernel isolation, credential proxy sidecars, network egress policies, and prompt integrity framework
Automated security audit agent that discovered and remediated four HIGH severity findings in 90-day production deployment
Open-source release of all Kubernetes configurations, audit tooling, and prompt integrity framework

🛡️ Threat Analysis

AI Supply Chain Attacks

Addresses security of the AI agent infrastructure and deployment pipeline in production, including credential management, configuration drift, and fleet security — protecting the AI system's operational environment.

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Applications

healthcare ai agentsambient documentationpatient engagement systems

Read PDF arXiv

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A2AS: Agentic AI Runtime Security and Self-Defense

Policy Compiler for Secure Agentic Systems

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

Optimizing Agent Planning for Security and Autonomy

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening