Florian Tramèr

h-index: 52 34,640 citations 93 papers (total)

Papers in Database (1)

benchmark arXiv Oct 10, 2025 · Oct 2025

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin et al. · OpenAI · Anthropic +6 more

Adaptive attacks via gradient descent, RL, and random search bypass 12 LLM jailbreak/prompt-injection defenses with >90% success rate

Input Manipulation Attack Prompt Injection nlp
34 citations 4 influentialPDF