Zhibo Zhang

attack arXiv Sep 8, 2025 · Sep 2025

Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift

Shuai Yuan, Zhibo Zhang, Yuxi Li et al. · University of Electronic Science and Technology of China · Huazhong University of Science and Technology +1 more

Injects adversarial perturbations into LLM embedding outputs at inference time to bypass safety alignment without modifying weights or prompts

Input Manipulation Attack Prompt Injection nlp

PDF

attack arXiv Apr 1, 2026 · 5d ago

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou et al. · Huazhong University of Science and Technology · Hubei University

Embeds latent trojans in individually safe LLMs that activate during model merging, bypassing safety alignment

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp

PDF

Papers in Database (2)

Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion