Kanan Gupta

defense arXiv Feb 4, 2026 · 8w ago

Debargha Ganguly, Sreehari Sankar, Biyao Zhang et al. · Case Western Reserve University · University of Pittsburgh +2 more

Defends LLMs against jailbreaks via OOD detection on safe prompts, reducing false positives by 40x over specialized safety models

Prompt Injection nlp

1 citations PDF

Papers in Database (1)