Xingjun Ma

attack arXiv Oct 6, 2025 · Oct 2025

Imperceptible Jailbreaking against Large Language Models

Kuofeng Gao, Yiming Li, Chao Du et al. · Tsinghua University · Sea AI Lab +3 more

Jailbreaks aligned LLMs using invisible Unicode variation selectors as adversarial suffixes, bypassing safety alignment with zero visible text modifications

Prompt Injection nlp

3 citations PDF Code

attack arXiv Nov 20, 2025 · Nov 2025

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Yige Li, Zhe Li, Wei Zhao et al. · Singapore Management University · The University of Melbourne +1 more

Automates LLM backdoor injection via LLM agents generating semantic triggers, achieving 90%+ success rate while evading state-of-the-art defenses

Model Poisoning Training Data Poisoning nlp

2 citations PDF Code

attack arXiv Feb 1, 2026 · 9w ago

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Kaiyuan Cui, Yige Li, Yutao Wu et al. · The University of Melbourne · Singapore Management University +2 more

Adversarial image attack jailbreaks VLMs with universal cross-target and cross-model transferability using a single surrogate model

Input Manipulation Attack Prompt Injection visionnlpmultimodal

PDF Code

benchmark arXiv Nov 24, 2025 · Nov 2025

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

Juncheng Li, Yige Li, Hanxun Huang et al. · Fudan University · Singapore Management University +1 more

Benchmarks backdoor attacks on VLMs, finding text triggers achieve 90%+ success at just 1% poisoning rate

Model Poisoning visionnlpmultimodal

PDF Code

Papers in Database (4)

Imperceptible Jailbreaking against Large Language Models

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models