benchmark 2025

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

Johan Wahréus , Ahmed Mohamed Hussain , Panos Papadimitratos

KTH Royal Institute of Technology

0 citations

Published on arXiv

2501.01335

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Prompt obfuscation attack achieves 88% jailbreak success rate on Gemini and 65% on ChatGPT, with Claude showing greater resilience at only 17%; method outperforms state-of-the-art on AdvBench at 78.5%

CySecBench prompt obfuscation jailbreak

Novel technique introduced

Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, particularly in specific domains, notably cybersecurity. To address this issue, we present and publicly release CySecBench, a comprehensive dataset containing 12662 prompts specifically designed to evaluate jailbreaking techniques in the cybersecurity domain. The dataset is organized into 10 distinct attack-type categories, featuring close-ended prompts to enable a more consistent and accurate assessment of jailbreaking attempts. Furthermore, we detail our methodology for dataset generation and filtration, which can be adapted to create similar datasets in other domains. To demonstrate the utility of CySecBench, we propose and evaluate a jailbreaking approach based on prompt obfuscation. Our experimental results show that this method successfully elicits harmful content from commercial black-box LLMs, achieving Success Rates (SRs) of 65% with ChatGPT and 88% with Gemini; in contrast, Claude demonstrated greater resilience with a jailbreaking SR of 17%. Compared to existing benchmark approaches, our method shows superior performance, highlighting the value of domain-specific evaluation datasets for assessing LLM security measures. Moreover, when evaluated using prompts from a widely used dataset (i.e., AdvBench), it achieved an SR of 78.5%, higher than the state-of-the-art methods.

Key Contributions

CySecBench: a publicly released dataset of 12,662 close-ended, cybersecurity-focused prompts organized into 10 attack-type categories for evaluating LLM jailbreaking
A dataset generation and filtration methodology adaptable to other domains
A prompt obfuscation jailbreaking approach achieving 65% SR on ChatGPT, 88% on Gemini, and 78.5% on AdvBench (surpassing state-of-the-art)

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

CySecBenchAdvBench

Applications

llm safety evaluationcybersecurity-domain jailbreak assessment

Read PDF arXiv

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Behind the Mask: Benchmarking Camouflaged Jailbreaks in Large Language Models

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

Quantifying CBRN Risk in Frontier Models

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Bayesian Evaluation of Large Language Model Behavior

Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

Consistency of Large Reasoning Models Under Multi-Turn Attacks