Adarsh Kumarappan

benchmark arXiv Nov 24, 2025 · Nov 2025

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

Adarsh Kumarappan, Ananya Mujoo · California Institute of Technology · Evergreen Valley College

Automated pipeline generating 1,500 psychologically-grounded multi-turn FITD jailbreaks; GPT family shows 32pp ASR increase with conversational history

Prompt Injection nlp

2 citations PDF

defense arXiv Nov 24, 2025 · Nov 2025

Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

Adarsh Kumarappan, Ayushi Mehrotra · California Institute of Technology

Probabilistic (k,ε)-unstable certificate tightens SmoothLLM's jailbreak defense guarantees for both GCG and PAIR attacks

Input Manipulation Attack Prompt Injection nlp

1 citations PDF Code

Papers in Database (2)

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM