Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
David Schmotz 1, Luca Beurer-Kellner 2, Sahar Abdelnabi 1, Maksym Andriushchenko 1
Published on arXiv
2602.20156
Prompt Injection
OWASP LLM Top 10 — LLM01
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
Frontier LLMs exhibit up to 80% attack success rate against skill-based prompt injection, with Best-of-N attack variants exceeding 50% ASR on most models even under explicit warning policies.
SkillInject
Novel technique introduced
LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.
Key Contributions
- SkillInject benchmark of 202 injection-task pairs spanning obviously malicious to subtle context-dependent skill-based attacks, evaluated on Claude Code, Gemini CLI, and OpenAI Codex CLI
- Empirical finding that frontier LLMs reach up to 80% attack success rate on skill-based prompt injection, unresolved by model scaling or simple input filtering
- Characterization of the dual-use, contextual nature of skill injections and evidence that robust defense requires context-aware authorization frameworks rather than pattern matching