Yutao Mou

h-index: 4 110 citations 8 papers (total)

Papers in Database (2)

defense arXiv Jan 15, 2026 · 11w ago

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Yutao Mou, Zhangchi Xue, Lijun Li et al. · Peking University · Shanghai Artificial Intelligence Laboratory

Proactive step-level guardrail for LLM agent tool calls defends against malicious requests and prompt injection, cutting harmful invocations by 65%

Insecure Plugin Design Prompt Injection nlp
2 citations PDF
attack arXiv Oct 9, 2025 · Oct 2025

AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Muxi Diao, Yutao Mou, Keqing He et al. · Beijing University of Posts and Telecommunications · Peking University +1 more

Seed-free LLM red teaming framework using persona-guided generation and reflection loops to produce diverse, high-ASR jailbreak prompts

Prompt Injection nlp
PDF