Yanyan Zhao

h-index: 1 3 citations 3 papers (total)

Papers in Database (1)

defense arXiv Jan 7, 2026 · Jan 2026

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

Di Wu, Yanyan Zhao, Xin Lu et al. · Harbin Institute of Technology

Self-improving safety alignment trains LLMs to iteratively reason over safety rules to resist jailbreak attacks

Prompt Injection nlp
1 citations PDF Code