Wenpeng Xing

Papers in Database (4)

attack arXiv Apr 9, 2026 · 6w ago

Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

Wenpeng Xing, Moran Fang, Guangtai Wang et al. · Zhejiang University · Binjiang Institute of Zhejiang University +1 more

Inference-time jailbreak attack that surgically ablates safety guardrails by suppressing refusal-inducing activation patterns in LLM hidden states

Prompt Injection nlp
PDF
defense arXiv Aug 14, 2025 · Aug 2025

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

Wenpeng Xing, Zhonghao Qi, Yupeng Qin et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Defends LLM-tool MCP interfaces from prompt injection and data exfiltration via a three-stage neural detection pipeline

Insecure Plugin Design Prompt Injection nlp
PDF
defense arXiv Sep 3, 2025 · Sep 2025

EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint

Zhenhua Xu, Meng Han, Wenpeng Xing · Zhejiang University · GenTel.io

Detects stolen LLMs via memorization-based probabilistic fingerprints that remain stealthy and robust under gray-box API access

Model Theft Model Theft nlp
PDF Code
defense arXiv Aug 31, 2025 · Aug 2025

PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement

Xubin Yue, Zhenhua Xu, Wenpeng Xing et al. · Zhejiang University · GenTel.io +1 more

Embeds ownership fingerprints in LLM parameter offsets via dual-channel knowledge editing, resisting fine-tuning erasure and feature-space defenses

Model Theft Model Theft nlp
PDF