Formal Analysis and Supply Chain Security for Agentic AI Skills
Published on arXiv
2603.00195
AI Supply Chain Attacks
OWASP ML Top 10 — ML06
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
SkillFortify achieves 96.95% F1 with 100% precision and 0% false positive rate on 540 agent skills, outperforming heuristic tools by detecting attack patterns invisible to pure pattern matching through combined static analysis and information flow analysis
SkillFortify
Novel technique introduced
The rapid proliferation of agentic AI skill ecosystems -- exemplified by OpenClaw (228,000 GitHub stars) and Anthropic Agent Skills (75,600 stars) -- has introduced a critical supply chain attack surface. The ClawHavoc campaign (January-February 2026) infiltrated over 1,200 malicious skills into the OpenClaw marketplace, while MalTool catalogued 6,487 malicious tools that evade conventional detection. In response, twelve reactive security tools emerged, yet all rely on heuristic methods that provide no formal guarantees. We present SkillFortify, the first formal analysis framework for agent skill supply chains, with six contributions: (1) the DY-Skill attacker model, a Dolev-Yao adaptation to the five-phase skill lifecycle with a maximality proof; (2) a sound static analysis framework grounded in abstract interpretation; (3) capability-based sandboxing with a confinement proof; (4) an Agent Dependency Graph with SAT-based resolution and lockfile semantics; (5) a trust score algebra with formal monotonicity; and (6) SkillFortifyBench, a 540-skill benchmark. SkillFortify achieves 96.95% F1 (95% CI: [95.1%, 98.4%]) with 100% precision and 0% false positive rate on 540 skills, while SAT-based resolution handles 1,000-node graphs in under 100 ms.
Key Contributions
- DY-Skill attacker model — a formal Dolev-Yao adaptation to the five-phase agent skill lifecycle (authorship → registry → installation → runtime → state persistence) with a maximality proof that any symbolic supply-chain attacker is simulable by a DY-Skill trace
- Sound static analysis (abstract interpretation over a four-element capability lattice) and capability-based sandboxing with a formal confinement proof guaranteeing no authority amplification beyond declared capabilities
- SkillFortifyBench (540-skill benchmark drawn from MalTool and ClawHavoc) on which SkillFortify achieves 96.95% F1 with 100% precision and 0% false positive rate, and SAT-based dependency resolution completes 1,000-node graphs in under 100 ms
🛡️ Threat Analysis
The paper's primary framing and title are explicitly about AI agent skill supply chain security — detecting and preventing malicious skills distributed via agent marketplaces (analogous to trojaned packages in npm/PyPI). The five-phase skill lifecycle threat model, SAT-based dependency resolution, and trust score algebra all target the supply chain attack surface before/during installation.