Jun Wang

attack arXiv Jan 22, 2026 · 10w ago

Fengheng Chu, Jiahao Chen, Yuhong Wang et al. · Southeast University · Zhejiang University +1 more

White-box jailbreak exploits safety-critical attention heads via activation repatching to bypass LLM safety guardrails

Prompt Injection nlp

defense arXiv Jan 30, 2026 · 9w ago

Naen Xu, Jinghuai Zhang, Ping He et al. · Zhejiang University · University of California +1 more

Knowledge graph defense framework that detects fraud tactics in LLM inputs and augments prompts with evidence to resist manipulation

Prompt Injection nlp

Papers in Database (2)