benchmark 2026

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

David Schmotz ¹, Luca Beurer-Kellner ², Sahar Abdelnabi ¹, Maksym Andriushchenko ¹

¹ Max Planck Institute for Intelligent Systems

² Snyk

0 citations · 44 references · arXiv (Cornell University)

Published on arXiv

2602.20156

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Frontier LLMs exhibit up to 80% attack success rate against skill-based prompt injection, with Best-of-N attack variants exceeding 50% ASR on most models even under explicit warning policies.

SkillInject

Novel technique introduced

LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.

Key Contributions

SkillInject benchmark of 202 injection-task pairs spanning obviously malicious to subtle context-dependent skill-based attacks, evaluated on Claude Code, Gemini CLI, and OpenAI Codex CLI
Empirical finding that frontier LLMs reach up to 80% attack success rate on skill-based prompt injection, unresolved by model scaling or simple input filtering
Characterization of the dual-use, contextual nature of skill injections and evidence that robust defense requires context-aware authorization frameworks rather than pattern matching

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

SkillInject (202 injection-task pairs)

Applications

llm agentscode execution agentsai coding assistants

Read PDF arXiv DOI Code

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Systematic Analysis of MCP Security

Quantifying Distributional Robustness of Agentic Tool-Selection

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP