benchmark arXiv Apr 3, 2026 · 5d ago
Zhihao Chen, Ying Zhang, Yi Liu et al. · Fujian Normal University · Wake Forest University +7 more
Large-scale analysis of 17K LLM agent skills finding 520 vulnerable to credential leakage via debug logging and prompt injection
AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp
Third-party skills extend LLM agents with powerful capabilities but often handle sensitive credentials in privileged environments, making leakage risks poorly understood. We present the first large-scale empirical study of this problem, analyzing 17,022 skills (sampled from 170,226 on SkillsMP) using static analysis, sandbox testing, and manual inspection. We identify 520 vulnerable skills with 1,708 issues and derive a taxonomy of 10 leakage patterns (4 accidental and 6 adversarial). We find that (1) leakage is fundamentally cross-modal: 76.3% require joint analysis of code and natural language, while 3.1% arise purely from prompt injection; (2) debug logging is the primary vector, with print and console.log causing 73.5% of leaks due to stdout exposure to LLMs; and (3) leaked credentials are both exploitable (89.6% without privileges) and persistent, as forks retain secrets even after upstream fixes. After disclosure, all malicious skills were removed and 91.6% of hardcoded credentials were fixed. We release our dataset, taxonomy, and detection pipeline to support future research.
llm Fujian Normal University · Wake Forest University · Quantstamp +6 more
attack arXiv Apr 3, 2026 · 5d ago
Yubin Qu, Yi Liu, Tongcheng Geng et al. · Griffith University · Quantstamp +6 more
Supply-chain attack embedding malicious payloads in LLM agent skill documentation, achieving up to 33.5% bypass of defenses
AI Supply Chain Attacks Insecure Plugin Design Excessive Agency nlp
LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.
llm Griffith University · Quantstamp · Nanyang Technological University +5 more