ML Security Papers

Latest papers

3 papers

survey arXiv Jan 7, 2026 · 12w ago

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Zejian Chen, Chaozhuo Li, Chao Li et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Surveys LLM and VLM jailbreak attacks and defenses, proposing a unified three-layer defense framework across text and multimodal settings

Input Manipulation Attack Prompt Injection nlpmultimodal

1 citations PDF

benchmark arXiv Jan 4, 2026 · Jan 2026

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Songyang Liu, Chaozhuo Li, Rui Pu et al. · Beijing University of Posts and Telecommunications · China Academy of Information and Communications Technology

Proposes fine-grained jailbreak evaluation framework that corrects 27% overestimation of attack success in existing LLM safety benchmarks

Prompt Injection nlp

PDF

defense arXiv Aug 31, 2025 · Aug 2025

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

Zhenhua Xu, Zhaokun Yan, Binhan Xu et al. · Zhejiang University · China Academy of Information and Communications Technology +3 more

Embeds backdoor ownership fingerprints into LoRA adapters for lightweight, transferable LLM IP protection across downstream models

Model Theft Model Theft nlp

PDF Code

Latest papers

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Unlocking the Effectiveness of LoRA-FP for Seamless Transfer Implantation of Fingerprints in Downstream Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue