Zhe Li

h-index: 4 98 citations 12 papers (total)

Papers in Database (4)

attack arXiv Nov 20, 2025 · Nov 2025

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Yige Li, Zhe Li, Wei Zhao et al. · Singapore Management University · The University of Melbourne +1 more

Automates LLM backdoor injection via LLM agents generating semantic triggers, achieving 90%+ success rate while evading state-of-the-art defenses

Model Poisoning Training Data Poisoning nlp
2 citations PDF Code
defense arXiv Nov 20, 2025 · Nov 2025

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Wei Zhao, Zhe Li, Yige Li et al. · Singapore Management University

Defends VLMs against adversarial visual jailbreaks using two-level vector quantization as a discrete bottleneck

Input Manipulation Attack Prompt Injection visionnlpmultimodal
1 citations PDF Code
tool arXiv Sep 26, 2025 · Sep 2025

Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing

Zhe Li, Wei Zhao, Yige Li et al. · Singapore Management University

Diagnostic tool tracing undesirable LLM behaviors to specific training samples via representation gradient analysis in activation space

Model Poisoning Data Poisoning Attack nlp
PDF Code
benchmark arXiv Dec 4, 2025 · Dec 2025

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security

Wei Zhao, Zhe Li, Jun Sun · Singapore Management University

Surveys and benchmarks causality-based jailbreak attacks and defenses, showing safety mechanisms localize in 1–2% of LLM neurons

Model Poisoning Prompt Injection nlp
PDF Code