Latest papers

2 papers
benchmark arXiv Jan 7, 2026 · 12w ago

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Quy-Anh Dang, Chris Ngo, Truong-Son Hy · VNU University of Science · Knovel +1 more

Aggregates 37 red-teaming datasets into a unified LLM benchmark with standardized taxonomy across 22 risk categories

Prompt Injection nlp
PDF Code
survey arXiv Oct 17, 2025 · Oct 2025

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Hanbin Hong, Shuya Feng, Nima Naderloui et al. · University of Connecticut · University of Alabama at Birmingham

SoK survey unifying LLM jailbreak taxonomy, threat models, evaluation toolkit, and the largest annotated jailbreak dataset

Input Manipulation Attack Prompt Injection nlp
2 citations 1 influentialPDF Code