benchmark 2026

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu 1, Zhihao Chen 2, Yanjun Zhang 3, Gelei Deng 4, Yuekang Li 5, Jianting Ning 6, Leo Zhang 3

2 citations · 1 influential · 35 references · arXiv (Cornell University)

α

Published on arXiv

2602.06547

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

157 confirmed malicious skills with 632 vulnerabilities found in the wild; a single threat actor accounts for 54.1% of cases via templated brand impersonation; responsible disclosure achieved 93.6% removal within 30 days.

SkillScan

Novel technique introduced


Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user privileges and are distributed through community registries with minimal vetting, but no ground-truth dataset exists to characterize the resulting threats. We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills from two community registries, confirming 157 malicious skills with 632 vulnerabilities. These attacks are not incidental. Malicious skills average 4.03 vulnerabilities across a median of three kill chain phases, and the ecosystem has split into two archetypes: Data Thieves that exfiltrate credentials through supply chain techniques, and Agent Hijackers that subvert agent decision-making through instruction manipulation. A single actor accounts for 54.1\% of confirmed cases through templated brand impersonation. Shadow features, capabilities absent from public documentation, appear in 0\% of basic attacks but 100\% of advanced ones; several skills go further by exploiting the AI platform's own hook system and permission flags. Responsible disclosure led to 93.6\% removal within 30 days. We release the dataset and analysis pipeline to support future work on agent skill security.


Key Contributions

  • First labeled dataset of 157 confirmed malicious agent skills with 632 vulnerabilities behaviorally verified from 98,380 skills across two community registries
  • Taxonomy of two attack archetypes — Data Thieves (supply chain credential exfiltration) and Agent Hijackers (instruction manipulation to subvert LLM agent decisions) — with kill chain phase analysis showing average 4.03 vulnerabilities per skill
  • Discovery that shadow features (undocumented capabilities) appear in 0% of basic attacks but 100% of advanced ones, plus release of the full three-tiered dataset and SkillScan analysis pipeline

🛡️ Threat Analysis

AI Supply Chain Attacks

Malicious skills distributed through community registries with minimal vetting is a textbook supply chain attack on the LLM agent ecosystem; the 'Data Thieves' archetype explicitly uses supply chain techniques to exfiltrate credentials, and a single actor conducts templated brand impersonation across the registry.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
community agent skill registries (98,380 skills, 2 registries)
Applications
llm agent platformsagent skill registriesai assistant plugins