ML Security Papers

benchmark arXiv Nov 24, 2025 · Nov 2025

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

Adarsh Kumarappan, Ananya Mujoo · California Institute of Technology · Evergreen Valley College

Automated pipeline generating 1,500 psychologically-grounded multi-turn FITD jailbreaks; GPT family shows 32pp ASR increase with conversational history

Prompt Injection nlp

2 citations PDF

Latest papers

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue