Jizhong Han

Papers in Database (1)

defense arXiv Apr 24, 2026 · 27d ago

RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents

Wenjie Xiao, Xuehai Tang, Biyu Zhou et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences

Detects poisoned LLM agent skills by identifying attention hijacking patterns where malicious instructions redirect model reasoning

Prompt Injection Excessive Agency nlp
PDF