attack 2025

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz ^1,2,3, Sahar Abdelnabi ^1,2,3, Maksym Andriushchenko ^1,2,3

¹ ELLIS Institute Tübingen

² MPI for Intelligent Systems

³ Tübingen AI Center

8 citations · 1 influential · 5 references · arXiv

Published on arXiv

2510.26328

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Frontier LLMs remain vulnerable to trivially simple prompt injections hidden in Agent Skill files, with no gradient-based optimization required, enabling silent file exfiltration and full guardrail bypass via approval carryover.

Agent Skills Prompt Injection

Novel technique introduced

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model capabilities, frontier LLMs remain vulnerable to very simple prompt injections in realistic scenarios. Our code is available at https://github.com/aisa-group/promptinject-agent-skills.

Key Contributions

Demonstrates that Agent Skills (Anthropic's plugin framework) enable trivially simple indirect prompt injections requiring no iterative optimization, because every line of a skill file is interpreted as an instruction
Shows practical data exfiltration attack by embedding a malicious backup script in a legitimate pptx-editing skill that silently uploads files to an external server
Reveals guardrail bypass: a benign 'Don't ask again' approval for Python execution carries over to the malicious upload script, eliminating user friction

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timetargeteddigital

Applications

llm coding agentsclaude codellm agent frameworks

Read PDF arXiv DOI Code

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

When Skills Lie: Hidden-Comment Injection in LLM Agents

Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools