Zequn Sun

h-index: 13 403 citations 32 papers (total)

Papers in Database (1)

defense arXiv Jan 23, 2026 · 10w ago

SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment

Xianya Fang, Xianying Luo, Yadong Wang et al. · Nanjing University of Aeronautics and Astronautics · Tsinghua University +3 more

Adaptive three-stage LLM defense routes inputs by risk level to counter jailbreaks and prefilling attacks without sacrificing utility

Prompt Injection nlp
PDF