When Skills Lie: Hidden-Comment Injection in LLM Agents

LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.

Key Contributions

Identifies and demonstrates a hidden-comment prompt injection vulnerability in LLM agent Skill documents, where HTML comments invisible to human reviewers are still processed verbatim by the model.
Shows that DeepSeek-V3.2 and GLM-4.5-Air can be steered toward sensitive tool calls (environment variable enumeration, credential file reads, exfiltration HTTP requests) via this vector during benign user tasks.
Proposes a two-tiered defense combining an untrusted-Skill system prompt guardrail with execution-layer blocking that prevents malicious tool invocations and surfaces hidden instructions.