Fengheng Chu

h-index: 1 26 citations 3 papers (total)

Papers in Database (1)

attack arXiv Jan 22, 2026 · 10w ago

Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models

Fengheng Chu, Jiahao Chen, Yuhong Wang et al. · Southeast University · Zhejiang University +1 more

White-box jailbreak exploits safety-critical attention heads via activation repatching to bypass LLM safety guardrails

Prompt Injection nlp
PDF