Mohan Li

attack arXiv Aug 8, 2025 · Aug 2025

Wenpeng Xing, Mohan Li, Chunqiang Hu et al. · Bingjiang Institute of Zhejiang University · Zhejiang University +3 more

White-box jailbreak fuses harmful and benign hidden states in latent space to bypass LLM safety alignment with 94% ASR

Input Manipulation Attack Prompt Injection nlp

defense arXiv Aug 31, 2025 · Aug 2025

Xubin Yue, Zhenhua Xu, Wenpeng Xing et al. · Zhejiang University · GenTel.io +1 more

Embeds ownership fingerprints in LLM parameter offsets via dual-channel knowledge editing, resisting fine-tuning erasure and feature-space defenses

Model Theft Model Theft nlp

Papers in Database (2)