Latest papers

2 papers
defense arXiv Feb 15, 2026 · 7w ago

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai et al. · NTU · BUPT +3 more

Proposes MCPShield, a lifecycle-aware security layer defending LLM agents against malicious third-party MCP tool servers

Insecure Plugin Design nlp
PDF
defense arXiv Sep 29, 2025 · Sep 2025

DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

Zherui Li, Zheng Nie, Zhenhong Zhou et al. · Beijing University of Posts and Telecommunications · National University of Singapore +5 more

Defends diffusion LLMs against jailbreaks by fixing greedy remasking bias and block-level autonomous safety repair

Prompt Injection nlp
3 citations 2 influentialPDF Code