Wei Zhao

h-index: 4 97 citations 11 papers (total)

Papers in Database (4)

attack arXiv Nov 20, 2025 · Nov 2025

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Yige Li, Zhe Li, Wei Zhao et al. · Singapore Management University · The University of Melbourne +1 more

Automates LLM backdoor injection via LLM agents generating semantic triggers, achieving 90%+ success rate while evading state-of-the-art defenses

Model Poisoning Training Data Poisoning nlp
2 citations PDF Code
defense arXiv Nov 20, 2025 · Nov 2025

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Wei Zhao, Zhe Li, Yige Li et al. · Singapore Management University

Defends VLMs against adversarial visual jailbreaks using two-level vector quantization as a discrete bottleneck

Input Manipulation Attack Prompt Injection visionnlpmultimodal
1 citations PDF Code
benchmark arXiv Dec 4, 2025 · Dec 2025

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security

Wei Zhao, Zhe Li, Jun Sun · Singapore Management University

Surveys and benchmarks causality-based jailbreak attacks and defenses, showing safety mechanisms localize in 1–2% of LLM neurons

Model Poisoning Prompt Injection nlp
PDF Code
tool arXiv Sep 26, 2025 · Sep 2025

Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing

Zhe Li, Wei Zhao, Yige Li et al. · Singapore Management University

Diagnostic tool tracing undesirable LLM behaviors to specific training samples via representation gradient analysis in activation space

Model Poisoning Data Poisoning Attack nlp
PDF Code