SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong 1, Shuya Feng 1,2, Nima Naderloui 1, Shenao Yan 1, Jingyu Zhang , Biying Liu , Ali Arastehfard 1, Heqing Huang , Yuan Hong 1
Published on arXiv
2510.15476
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Unifies fragmented LLM prompt security research through a holistic taxonomy, machine-readable threat models, and the largest annotated jailbreak benchmark to date.
JAILBREAKDB
Novel technique introduced
Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date;\footnote{The dataset is released at \href{https://huggingface.co/datasets/youbin2014/JailbreakDB}{\textcolor{purple}{https://huggingface.co/datasets/youbin2014/JailbreakDB}}.} and (5) presenting a comprehensive evaluation platform and leaderboard of state-of-the-art methods \footnote{will be released soon.}. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.
Key Contributions
- Multi-level taxonomy organizing LLM prompt security attacks, defenses, and vulnerabilities with formalized, machine-readable threat model profiles
- JAILBREAKDB: the largest annotated dataset of jailbreak and benign prompts released on HuggingFace
- Open-source evaluation toolkit and leaderboard enabling standardized, reproducible comparison of attack and defense methods
🛡️ Threat Analysis
The taxonomy covers gradient-based adversarial suffix attacks (e.g., GCG-style token-level optimizations) on LLMs alongside natural-language jailbreaks, making ML01 relevant as a secondary category in this comprehensive survey.