Haitao Xu

Papers in Database (2)

attack arXiv Aug 8, 2025 · Aug 2025

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

Wenpeng Xing, Mohan Li, Chunqiang Hu et al. · Bingjiang Institute of Zhejiang University · Zhejiang University +3 more

White-box jailbreak fuses harmful and benign hidden states in latent space to bypass LLM safety alignment with 94% ASR

Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Aug 31, 2025 · Aug 2025

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

Zhenhua Xu, Zhaokun Yan, Binhan Xu et al. · Zhejiang University · China Academy of Information and Communications Technology +3 more

Embeds backdoor ownership fingerprints into LoRA adapters for lightweight, transferable LLM IP protection across downstream models

Model Theft Model Theft nlp
PDF Code