Johan Wahréus

benchmark arXiv Jan 2, 2025 · Jan 2025

Johan Wahréus, Ahmed Mohamed Hussain, Panos Papadimitratos · KTH Royal Institute of Technology

Introduces cybersecurity-domain jailbreak benchmark with 12,662 prompts; prompt obfuscation attack achieves 88% success on Gemini

Prompt Injection nlp

attack arXiv Sep 16, 2025 · Sep 2025

Johan Wahréus, Ahmed Hussain, Panos Papadimitratos · KTH Royal Institute of Technology

Iterative two-stage jailbreak escalates abstract malicious prompts to executable code, hitting 62% success rate at 7.5¢ per prompt

Prompt Injection nlp

Papers in Database (2)