Changting Lin

Papers in Database (2)

attack arXiv Apr 9, 2026 · 6w ago

Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

Wenpeng Xing, Moran Fang, Guangtai Wang et al. · Zhejiang University · Binjiang Institute of Zhejiang University +1 more

Inference-time jailbreak attack that surgically ablates safety guardrails by suppressing refusal-inducing activation patterns in LLM hidden states

Prompt Injection nlp
PDF
defense arXiv Aug 14, 2025 · Aug 2025

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

Wenpeng Xing, Zhonghao Qi, Yupeng Qin et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Defends LLM-tool MCP interfaces from prompt injection and data exfiltration via a three-stage neural detection pipeline

Insecure Plugin Design Prompt Injection nlp
PDF