Kanan Gupta

h-index: 1 1 citations 1 papers (total)

Papers in Database (1)

defense arXiv Feb 4, 2026 · 8w ago

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang et al. · Case Western Reserve University · University of Pittsburgh +2 more

Defends LLMs against jailbreaks via OOD detection on safe prompts, reducing false positives by 40x over specialized safety models

Prompt Injection nlp
1 citations PDF