Johan Wahréus

Papers in Database (2)

benchmark arXiv Jan 2, 2025 · Jan 2025

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

Johan Wahréus, Ahmed Mohamed Hussain, Panos Papadimitratos · KTH Royal Institute of Technology

Introduces cybersecurity-domain jailbreak benchmark with 12,662 prompts; prompt obfuscation attack achieves 88% success on Gemini

Prompt Injection nlp
PDF
attack arXiv Sep 16, 2025 · Sep 2025

Jailbreaking Large Language Models Through Content Concretization

Johan Wahréus, Ahmed Hussain, Panos Papadimitratos · KTH Royal Institute of Technology

Iterative two-stage jailbreak escalates abstract malicious prompts to executable code, hitting 62% success rate at 7.5¢ per prompt

Prompt Injection nlp
PDF