Jinman Wu

attack arXiv Mar 6, 2026 · 4w ago

Jinman Wu, Yi Xie, Shen Lin et al. · Xidian University · Tsinghua University +2 more

Discovers two disentangled safety subspaces in LLMs and exploits them to surgically disable refusal while preserving harmfulness recognition

Prompt Injection nlp

attack arXiv Mar 6, 2026 · 4w ago

Jinman Wu, Yi Xie, Shiqian Zhao et al. · Xidian University · Tsinghua University +1 more

White-box jailbreak targeting LLM attention heads via layer-wise perturbation, improving ASR 14% over SOTA

Input Manipulation Attack Prompt Injection nlp

Papers in Database (2)