Chao Li

survey arXiv Jan 7, 2026 · 12w ago

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Zejian Chen, Chaozhuo Li, Chao Li et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Surveys LLM and VLM jailbreak attacks and defenses, proposing a unified three-layer defense framework across text and multimodal settings

Input Manipulation Attack Prompt Injection nlpmultimodal

1 citations PDF

benchmark arXiv Jan 4, 2026 · Jan 2026

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Songyang Liu, Chaozhuo Li, Rui Pu et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Proposes fine-grained jailbreak evaluation framework that corrects 27% overestimation of attack success in existing LLM safety benchmarks

Prompt Injection nlp

PDF

Papers in Database (2)

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference