Naiqiang Tan

h-index: 6 284 citations 11 papers (total)

Papers in Database (1)

defense arXiv Jan 23, 2026 · 10w ago

SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment

Xianya Fang, Xianying Luo, Yadong Wang et al. · Nanjing University of Aeronautics and Astronautics · Tsinghua University +3 more

Adaptive three-stage LLM defense routes inputs by risk level to counter jailbreaks and prefilling attacks without sacrificing utility

Prompt Injection nlp
PDF