Bo Jin

attack arXiv Sep 4, 2025 · Sep 2025

Xin Tong, Zhi Lin, Jingya Wang et al. · People’s Public Security University of China · Tsinghua University +2 more

Factorizes LLM refusal directions into topic-specific vectors to achieve fine-grained, semantically controlled safety alignment bypass

Prompt Injection nlp

Papers in Database (1)