benchmark arXiv Oct 14, 2025 ยท Oct 2025
Giacomo Bertollo, Naz Bodemir, Jonah Burgess
CTF study of 500 participants reveals layered multi-step AI guardrails significantly resist common jailbreak techniques versus simple defenses
Prompt Injection nlp
Analyzing 500 CTF participants, this paper shows that while participants readily bypassed simple AI guardrails using common techniques, layered multi-step defenses still posed significant challenges, offering concrete insights for building safer AI systems.
llm