ML Security Papers

Latest papers

2 papers

benchmark arXiv Jan 7, 2026 · 12w ago

Quy-Anh Dang, Chris Ngo, Truong-Son Hy · VNU University of Science · Knovel +1 more

Aggregates 37 red-teaming datasets into a unified LLM benchmark with standardized taxonomy across 22 risk categories

Prompt Injection nlp

survey arXiv Oct 17, 2025 · Oct 2025

Hanbin Hong, Shuya Feng, Nima Naderloui et al. · University of Connecticut · University of Alabama at Birmingham

SoK survey unifying LLM jailbreak taxonomy, threat models, evaluation toolkit, and the largest annotated jailbreak dataset

Input Manipulation Attack Prompt Injection nlp

2 citations 1 influentialPDF Code