Litian Zhang

survey arXiv Jan 7, 2026 · 12w ago

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Zejian Chen, Chaozhuo Li, Chao Li et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Surveys LLM and VLM jailbreak attacks and defenses, proposing a unified three-layer defense framework across text and multimodal settings

Input Manipulation Attack Prompt Injection nlpmultimodal

1 citations PDF

defense arXiv Dec 3, 2025 · Dec 2025

From static to adaptive: immune memory-based jailbreak detection for large language models

Jun Leng, Yu Liu, Litian Zhang et al. · Beijing University of Posts and Telecommunications · Hunan Branch of National Computer Network Emergency Response +1 more

Adaptive jailbreak detection for LLMs using immune memory retrieval and dual-agent simulation to counter evolving attacks

Prompt Injection nlp

PDF

benchmark arXiv Jan 4, 2026 · Jan 2026

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Songyang Liu, Chaozhuo Li, Rui Pu et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Proposes fine-grained jailbreak evaluation framework that corrects 27% overestimation of attack success in existing LLM safety benchmarks

Prompt Injection nlp

PDF

Papers in Database (3)

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

From static to adaptive: immune memory-based jailbreak detection for large language models

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference