Yanshu Li

benchmark arXiv Dec 5, 2025 · Dec 2025

TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations

Xiuyuan Chen, Jian Zhao, Yuxiang He et al. · Institute of Artificial Intelligence (TeleAI) of China Telecom · Shanghai Jiao Tong University +6 more

Benchmarks LLM jailbreak robustness across 19 attacks, 29 defenses, and 19 evaluators on 14 models in a unified reproducible framework

Prompt Injection nlp

2 citations PDF Code

benchmark arXiv Oct 5, 2025 · Oct 2025

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs

Rui Wu, Yihao Quan, Zeru Shi et al. · Rutgers University

Identifies 'consequence-blindness' in LLMs, benchmarks jailbreak and over-refusal failures across semantic/outcome risk mismatches, and fine-tunes defenses with consequence-aware data

Prompt Injection nlp

1 citations 1 influentialPDF Code

Papers in Database (2)

TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs