Qianli Zhou

h-index: 1 2 citations 2 papers (total)

Papers in Database (1)

defense arXiv Nov 24, 2025 · Nov 2025

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Junbo Zhang, Ran Chen, Qianli Zhou et al. · Northwestern Polytechnical University

Defends LLMs against jailbreaks via safety-representation intervention that reduces over-refusal without sacrificing safety alignment

Prompt Injection nlp
1 citations PDF