benchmark 2026

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

2 citations · 1 influential · 35 references · arXiv (Cornell University)

Published on arXiv

2602.06547

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

157 confirmed malicious skills with 632 vulnerabilities found in the wild; a single threat actor accounts for 54.1% of cases via templated brand impersonation; responsible disclosure achieved 93.6% removal within 30 days.

SkillScan

Novel technique introduced

Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user privileges and are distributed through community registries with minimal vetting, but no ground-truth dataset exists to characterize the resulting threats. We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills from two community registries, confirming 157 malicious skills with 632 vulnerabilities. These attacks are not incidental. Malicious skills average 4.03 vulnerabilities across a median of three kill chain phases, and the ecosystem has split into two archetypes: Data Thieves that exfiltrate credentials through supply chain techniques, and Agent Hijackers that subvert agent decision-making through instruction manipulation. A single actor accounts for 54.1\% of confirmed cases through templated brand impersonation. Shadow features, capabilities absent from public documentation, appear in 0\% of basic attacks but 100\% of advanced ones; several skills go further by exploiting the AI platform's own hook system and permission flags. Responsible disclosure led to 93.6\% removal within 30 days. We release the dataset and analysis pipeline to support future work on agent skill security.

Key Contributions

First labeled dataset of 157 confirmed malicious agent skills with 632 vulnerabilities behaviorally verified from 98,380 skills across two community registries
Taxonomy of two attack archetypes — Data Thieves (supply chain credential exfiltration) and Agent Hijackers (instruction manipulation to subvert LLM agent decisions) — with kill chain phase analysis showing average 4.03 vulnerabilities per skill
Discovery that shadow features (undocumented capabilities) appear in 0% of basic attacks but 100% of advanced ones, plus release of the full three-tiered dataset and SkillScan analysis pipeline

🛡️ Threat Analysis

AI Supply Chain Attacks

Malicious skills distributed through community registries with minimal vetting is a textbook supply chain attack on the LLM agent ecosystem; the 'Data Thieves' archetype explicitly uses supply chain techniques to exfiltrate credentials, and a single actor conducts templated brand impersonation across the registry.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

community agent skill registries (98,380 skills, 2 registries)

Applications

llm agent platformsagent skill registriesai assistant plugins

Read PDF arXiv DOI

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Quantifying Distributional Robustness of Agentic Tool-Selection

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols