defense 2026

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

0 citations

Published on arXiv

2604.10286

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Calibrated fusion achieves 0.439 high-risk AUPRC on indirect prompt injection attacks, improving over 0.405 for contextual scorer and 0.380 for static baseline

STARS

Novel technique introduced

Autonomous language-model agents increasingly rely on installable skills and tools to complete user tasks. Static skill auditing can expose capability surface before deployment, but it cannot determine whether a particular invocation is unsafe under the current user request and runtime context. We therefore study skill invocation auditing as a continuous-risk estimation problem: given a user request, candidate skill, and runtime context, predict a score that supports ranking and triage before a hard intervention is applied. We introduce STARS, which combines a static capability prior, a request-conditioned invocation risk model, and a calibrated risk-fusion policy. To evaluate this setting, we construct SIA-Bench, a benchmark of 3,000 invocation records with group-safe splits, lineage metadata, runtime context, canonical action labels, and derived continuous-risk targets. On a held-out split of indirect prompt injection attacks, calibrated fusion reaches 0.439 high-risk AUPRC, improving over 0.405 for the contextual scorer and 0.380 for the strongest static baseline, while the contextual scorer remains better calibrated with 0.289 expected calibration error. On the locked in-distribution test split, gains are smaller and static priors remain useful. The resulting claim is therefore narrower: request-conditioned auditing is most valuable as an invocation-time risk-scoring and triage layer rather than as a replacement for static screening. Code is available at https://github.com/123zgj123/STARS.

Key Contributions

STARS framework combining static capability priors, request-conditioned invocation risk model, and calibrated risk-fusion policy for runtime tool auditing
SIA-Bench benchmark with 3,000 invocation records including group-safe splits, runtime context, and continuous risk labels
Demonstrates request-conditioned auditing improves high-risk AUPRC from 0.380 to 0.439 on held-out indirect prompt injection attacks

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

SIA-Bench

Applications

autonomous agentstool-augmented llmsagent safety

Read PDF arXiv Code

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

A Safety and Security Framework for Real-World Agentic Systems

Authenticated Workflows: A Systems Approach to Protecting Agentic AI

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Systems Security Foundations for Agentic Computing

Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

Evaluating Privilege Usage of Agents on Real-World Tools