survey 2025

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Hanbin Hong 1, Shuya Feng 1,2, Nima Naderloui 1, Shenao Yan 1, Jingyu Zhang , Biying Liu , Ali Arastehfard 1, Heqing Huang , Yuan Hong 1

2 citations · 1 influential · 239 references · arXiv

α

Published on arXiv

2510.15476

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Unifies fragmented LLM prompt security research through a holistic taxonomy, machine-readable threat models, and the largest annotated jailbreak benchmark to date.

JAILBREAKDB

Novel technique introduced


Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date;\footnote{The dataset is released at \href{https://huggingface.co/datasets/youbin2014/JailbreakDB}{\textcolor{purple}{https://huggingface.co/datasets/youbin2014/JailbreakDB}}.} and (5) presenting a comprehensive evaluation platform and leaderboard of state-of-the-art methods \footnote{will be released soon.}. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.


Key Contributions

  • Multi-level taxonomy organizing LLM prompt security attacks, defenses, and vulnerabilities with formalized, machine-readable threat model profiles
  • JAILBREAKDB: the largest annotated dataset of jailbreak and benign prompts released on HuggingFace
  • Open-source evaluation toolkit and leaderboard enabling standardized, reproducible comparison of attack and defense methods

🛡️ Threat Analysis

Input Manipulation Attack

The taxonomy covers gradient-based adversarial suffix attacks (e.g., GCG-style token-level optimizations) on LLMs alongside natural-language jailbreaks, making ML01 relevant as a secondary category in this comprehensive survey.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxgrey_boxwhite_boxinference_time
Datasets
JAILBREAKDB
Applications
large language modelschatbotsai-powered services