Nikhil Reddy Billa

h-index: 2 26 citations 4 papers (total)

Papers in Database (2)

defense arXiv Oct 24, 2025 · Oct 2025

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa et al. · Virginia Tech · Princeton University +1 more

Defends LLMs against novel jailbreaks by training on diverse compositions of adversarial skill primitives extracted from 32 prior attacks

Prompt Injection nlp
1 citations PDF
attack EMNLP Oct 27, 2025 · Oct 2025

Retracing the Past: LLMs Emit Training Data When They Get Lost

Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen et al. · Virginia Tech · Cisco Research

Extracts verbatim LLM training data by optimizing prompts to spike token entropy, achieving 22% extraction rate on Llama 2-70B

Model Inversion Attack Sensitive Information Disclosure nlp
PDF