defense 2026

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu ¹, Zhi Yang ¹, Zhiheng Jin ¹, Shuhe Wang ², Heng Zhang ³, Yanlin Fei ⁴, Lingfeng Zeng ¹, Fangqi Lou ¹, Shuo Zhang ³, Tu Hu ³, Jingping Liu ⁵, Rongze Chen ³, Xingyu Zhu ⁶, Kunyi Wang ³, Chaofa Yuan ³, Xin Guo ¹, Zhaowei Liu ¹, Feipeng Zhang ⁷, Jie Huang ¹, Huacan Wang ³, Ronghao Chen ³, Liwen Zhang ¹

¹ SUFE

² NUS

³ QuantaAlpha

⁴ CMU

⁵ SYSU

⁶ USTC

⁷ XJTU

0 citations · 28 references · arXiv (Cornell University)

Published on arXiv

2602.05386

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Achieves the lowest Attack Success Rate and False Positive Rate among compared defenses with only 8.3% latency overhead over undefended baseline

Spider-Sense (Intrinsic Risk Sensing)

Novel technique introduced

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

Key Contributions

Spider-Sense framework using Intrinsic Risk Sensing (IRS) for event-driven, selective defense — agents trigger security checks only upon perceiving risk rather than at fixed mandatory checkpoints
Hierarchical defense mechanism combining lightweight similarity matching for known attack patterns with deep internal reasoning for ambiguous cases, eliminating reliance on external guard models
S²Bench: a lifecycle-aware benchmark with realistic tool execution and multi-stage attack scenarios for rigorous evaluation of LLM agent defenses

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Datasets

S²Bench

Applications

autonomous llm agentstool-using agents

Read PDF arXiv DOI Code

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications

AgentSentinel: An End-to-End and Real-Time Security Defense Framework for Computer-Use Agents

Policy Compiler for Secure Agentic Systems

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

Optimizing Agent Planning for Security and Autonomy

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Agent-Sentry: Bounding LLM Agents via Execution Provenance

A2AS: Agentic AI Runtime Security and Self-Defense