attack 2026

When Skills Lie: Hidden-Comment Injection in LLM Agents

Qianli Wang , Boyang Ma , Minghui Xu , Yue Zhang

0 citations · 6 references · arXiv (Cornell University)

α

Published on arXiv

2602.10498

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

DeepSeek-V3.2 and GLM-4.5-Air both follow malicious instructions hidden in HTML comment blocks within Skill documents, generating sensitive tool-call intentions; a defensive system prompt blocks these calls and exposes the hidden payload.

Hidden-Comment Skill Injection

Novel technique introduced


LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.


Key Contributions

  • Identifies and demonstrates a hidden-comment prompt injection vulnerability in LLM agent Skill documents, where HTML comments invisible to human reviewers are still processed verbatim by the model.
  • Shows that DeepSeek-V3.2 and GLM-4.5-Air can be steered toward sensitive tool calls (environment variable enumeration, credential file reads, exfiltration HTTP requests) via this vector during benign user tasks.
  • Proposes a two-tiered defense combining an untrusted-Skill system prompt guardrail with execution-layer blocking that prevents malicious tool invocations and surfaces hidden instructions.

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeteddigital
Applications
llm agentside assistantstool-augmented llm systems