Bing Qin

h-index: 11 390 citations 43 papers (total)

Papers in Database (2)

defense arXiv Jan 7, 2026 · Jan 2026

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

Di Wu, Yanyan Zhao, Xin Lu et al. · Harbin Institute of Technology

Self-improving safety alignment trains LLMs to iteratively reason over safety rules to resist jailbreak attacks

Prompt Injection nlp
1 citations PDF Code
benchmark arXiv Jan 25, 2026 · 10w ago

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Jiahe Guo, Xiangran Guo, Yulin Hu et al. · Harbin Institute of Technology · Ltd

Personalized LLM agent memory biases intent inference, causing 15–244% higher attack success rates on harmful queries than stateless baselines

Prompt Injection nlp
PDF