Wei Zhao

attack arXiv Nov 20, 2025 · Nov 2025

Yige Li, Zhe Li, Wei Zhao et al. · Singapore Management University · The University of Melbourne +1 more

Automates LLM backdoor injection via LLM agents generating semantic triggers, achieving 90%+ success rate while evading state-of-the-art defenses

Model Poisoning Training Data Poisoning nlp

2 citations PDF Code

defense arXiv Nov 20, 2025 · Nov 2025

Wei Zhao, Zhe Li, Yige Li et al. · Singapore Management University

Defends VLMs against adversarial visual jailbreaks using two-level vector quantization as a discrete bottleneck

Input Manipulation Attack Prompt Injection visionnlpmultimodal

1 citations PDF Code

benchmark arXiv Dec 4, 2025 · Dec 2025

Wei Zhao, Zhe Li, Jun Sun · Singapore Management University

Surveys and benchmarks causality-based jailbreak attacks and defenses, showing safety mechanisms localize in 1–2% of LLM neurons

Model Poisoning Prompt Injection nlp

tool arXiv Sep 26, 2025 · Sep 2025

Zhe Li, Wei Zhao, Yige Li et al. · Singapore Management University

Diagnostic tool tracing undesirable LLM behaviors to specific training samples via representation gradient analysis in activation space

Model Poisoning Data Poisoning Attack nlp

Papers in Database (4)