When Skills Lie: Hidden-Comment Injection in LLM Agents
Qianli Wang , Boyang Ma , Minghui Xu , Yue Zhang
Published on arXiv
2602.10498
Prompt Injection
OWASP LLM Top 10 — LLM01
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Key Finding
DeepSeek-V3.2 and GLM-4.5-Air both follow malicious instructions hidden in HTML comment blocks within Skill documents, generating sensitive tool-call intentions; a defensive system prompt blocks these calls and exposes the hidden payload.
Hidden-Comment Skill Injection
Novel technique introduced
LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.
Key Contributions
- Identifies and demonstrates a hidden-comment prompt injection vulnerability in LLM agent Skill documents, where HTML comments invisible to human reviewers are still processed verbatim by the model.
- Shows that DeepSeek-V3.2 and GLM-4.5-Air can be steered toward sensitive tool calls (environment variable enumeration, credential file reads, exfiltration HTTP requests) via this vector during benign user tasks.
- Proposes a two-tiered defense combining an untrusted-Skill system prompt guardrail with execution-layer blocking that prevents malicious tool invocations and surfaces hidden instructions.