Zhe Li

attack arXiv Nov 20, 2025 · Nov 2025

Yige Li, Zhe Li, Wei Zhao et al. · Singapore Management University · The University of Melbourne +1 more

Automates LLM backdoor injection via LLM agents generating semantic triggers, achieving 90%+ success rate while evading state-of-the-art defenses

Model Poisoning Training Data Poisoning nlp

2 citations PDF Code

defense arXiv Nov 20, 2025 · Nov 2025

Wei Zhao, Zhe Li, Yige Li et al. · Singapore Management University

Defends VLMs against adversarial visual jailbreaks using two-level vector quantization as a discrete bottleneck

Input Manipulation Attack Prompt Injection visionnlpmultimodal

1 citations PDF Code

tool arXiv Sep 26, 2025 · Sep 2025

Zhe Li, Wei Zhao, Yige Li et al. · Singapore Management University

Diagnostic tool tracing undesirable LLM behaviors to specific training samples via representation gradient analysis in activation space

Model Poisoning Data Poisoning Attack nlp

benchmark arXiv Dec 4, 2025 · Dec 2025

Wei Zhao, Zhe Li, Jun Sun · Singapore Management University

Surveys and benchmarks causality-based jailbreak attacks and defenses, showing safety mechanisms localize in 1–2% of LLM neurons

Model Poisoning Prompt Injection nlp

Papers in Database (4)