ML Security Papers

ML Security Papers

Latest papers

1 papers

defense arXiv Jan 27, 2026 · 9w ago

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

Haonan Zhang, Dongxia Wang, Yi Liu et al. · Zhejiang University · Huzhou Institute of Industrial Control Technology +1 more

Defends LLMs against jailbreak and over-refusal simultaneously by aligning safety and answer vectors via closed-form weight updates

Prompt Injection nlp