Peijie Sun

h-index: 19 1,983 citations 44 papers (total)

Papers in Database (1)

attack arXiv Nov 10, 2025 · Nov 2025

Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment

Peng Zhang, Peijie Sun · Nanjing University of Posts and Telecommunications

White-box activation attack decomposes LLM safety alignment into two directions and neutralizes both, achieving 97.88% jailbreak success on Llama-2

Prompt Injection nlp
1 citations PDF