Wenpeng Xing

attack arXiv Apr 9, 2026 · 6w ago

Wenpeng Xing, Moran Fang, Guangtai Wang et al. · Zhejiang University · Binjiang Institute of Zhejiang University +1 more

Inference-time jailbreak attack that surgically ablates safety guardrails by suppressing refusal-inducing activation patterns in LLM hidden states

Prompt Injection nlp

defense arXiv Aug 14, 2025 · Aug 2025

Wenpeng Xing, Zhonghao Qi, Yupeng Qin et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Defends LLM-tool MCP interfaces from prompt injection and data exfiltration via a three-stage neural detection pipeline

Insecure Plugin Design Prompt Injection nlp

defense arXiv Sep 3, 2025 · Sep 2025

Zhenhua Xu, Meng Han, Wenpeng Xing · Zhejiang University · GenTel.io

Detects stolen LLMs via memorization-based probabilistic fingerprints that remain stealthy and robust under gray-box API access

Model Theft Model Theft nlp

defense arXiv Aug 31, 2025 · Aug 2025

Xubin Yue, Zhenhua Xu, Wenpeng Xing et al. · Zhejiang University · GenTel.io +1 more

Embeds ownership fingerprints in LLM parameter offsets via dual-channel knowledge editing, resisting fine-tuning erasure and feature-space defenses

Model Theft Model Theft nlp

Papers in Database (4)