Rahul Gupta

h-index: 3 187 citations 6 papers (total)

Papers in Database (1)

defense arXiv Oct 24, 2025 · Oct 2025

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa et al. · Virginia Tech · Princeton University +1 more

Defends LLMs against novel jailbreaks by training on diverse compositions of adversarial skill primitives extracted from 32 prior attacks

Prompt Injection nlp
1 citations PDF