Vipin Chaudhary

defense arXiv Feb 4, 2026 · 8w ago

Debargha Ganguly, Sreehari Sankar, Biyao Zhang et al. · Case Western Reserve University · University of Pittsburgh +2 more

Defends LLMs against jailbreaks via OOD detection on safe prompts, reducing false positives by 40x over specialized safety models

Prompt Injection nlp

1 citations PDF

attack TPS-ISA Oct 21, 2025 · Oct 2025

Alexander Nemecek, Zebin Yun, Zahra Rahmani et al. · Case Western Reserve University · Tel Aviv University

Evaluates membership inference attacks on clinical LLMs fine-tuned on EHR data using loss-based and paraphrase-perturbation methods

Membership Inference Attack Sensitive Information Disclosure nlp

Papers in Database (2)