defense 2026

AgenTRIM: Tool Risk Mitigation for Agentic AI

Roy Betser , Shamik Bose , Amit Giloni , Chiara Picardi , Sindhu Padakandla , Roman Vainshtein

4 citations · 62 references · arXiv

α

Published on arXiv

2601.12449

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

AgenTRIM substantially reduces indirect prompt injection attack success rate on AgentDojo while maintaining high task utility, without modifying the agent's internal reasoning.

AgenTRIM

Novel technique introduced


AI agents are autonomous systems that combine LLMs with external tools to solve complex tasks. While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents may retain unnecessary permissions (excessive agency) or fail to invoke required tools (insufficient agency), amplifying the attack surface and reducing performance. We introduce AgenTRIM, a framework for detecting and mitigating tool-driven agency risks without altering an agent's internal reasoning. AgenTRIM addresses these risks through complementary offline and online phases. Offline, AgenTRIM reconstructs and verifies the agent's tool interface from code and execution traces. At runtime, it enforces per-step least-privilege tool access through adaptive filtering and status-aware validation of tool calls. Evaluating on the AgentDojo benchmark, AgenTRIM substantially reduces attack success while maintaining high task performance. Additional experiments show robustness to description-based attacks and effective enforcement of explicit safety policies. Together, these results demonstrate that AgenTRIM provides a practical, capability-preserving approach to safer tool use in LLM-based agents.


Key Contributions

  • Characterizes 'unbalanced tool-driven agency' (excessive and insufficient agency) as a unified security risk for LLM agents
  • Offline tool extractor that combines deterministic code/trace analysis with LLM-assisted generation to build a verified tool inventory
  • Online tool orchestrator enforcing per-step least-privilege tool access via adaptive filtering and status-aware validation, achieving state-of-the-art attack success rate reduction on AgentDojo

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Datasets
AgentDojo
Applications
llm-based ai agentsagentic ai systems