Latest papers

3 papers
survey arXiv Jan 7, 2026 · 12w ago

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Zejian Chen, Chaozhuo Li, Chao Li et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Surveys LLM and VLM jailbreak attacks and defenses, proposing a unified three-layer defense framework across text and multimodal settings

Input Manipulation Attack Prompt Injection nlpmultimodal
1 citations PDF
benchmark arXiv Jan 4, 2026 · Jan 2026

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Songyang Liu, Chaozhuo Li, Rui Pu et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Proposes fine-grained jailbreak evaluation framework that corrects 27% overestimation of attack success in existing LLM safety benchmarks

Prompt Injection nlp
PDF
defense arXiv Aug 31, 2025 · Aug 2025

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

Zhenhua Xu, Zhaokun Yan, Binhan Xu et al. · Zhejiang University · China Academy of Information and Communications Technology +3 more

Embeds backdoor ownership fingerprints into LoRA adapters for lightweight, transferable LLM IP protection across downstream models

Model Theft Model Theft nlp
PDF Code