Bing Qin

defense arXiv Jan 7, 2026 · Jan 2026

Di Wu, Yanyan Zhao, Xin Lu et al. · Harbin Institute of Technology

Self-improving safety alignment trains LLMs to iteratively reason over safety rules to resist jailbreak attacks

Prompt Injection nlp

1 citations PDF Code

benchmark arXiv Jan 25, 2026 · 10w ago

Jiahe Guo, Xiangran Guo, Yulin Hu et al. · Harbin Institute of Technology · Ltd

Personalized LLM agent memory biases intent inference, causing 15–244% higher attack success rates on harmful queries than stateless baselines

Prompt Injection nlp

Papers in Database (2)