Yige Li

attack arXiv Nov 20, 2025 · Nov 2025

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Yige Li, Zhe Li, Wei Zhao et al. · Singapore Management University · The University of Melbourne +1 more

Automates LLM backdoor injection via LLM agents generating semantic triggers, achieving 90%+ success rate while evading state-of-the-art defenses

Model Poisoning Training Data Poisoning nlp

2 citations PDF Code

benchmark arXiv Jan 8, 2026 · 12w ago

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng, Yige Li, Yutao Wu et al. · Fudan University · Alibaba Group +4 more

Benchmark framework systematizing backdoor attacks across planning, memory, and tool-use stages of LLM agent workflows

Model Poisoning Excessive Agency nlpmultimodal

1 citations PDF Code

defense arXiv Nov 20, 2025 · Nov 2025

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Wei Zhao, Zhe Li, Yige Li et al. · Singapore Management University

Defends VLMs against adversarial visual jailbreaks using two-level vector quantization as a discrete bottleneck

Input Manipulation Attack Prompt Injection visionnlpmultimodal

1 citations PDF Code

benchmark arXiv Nov 15, 2025 · Nov 2025

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

Jiayu Li, Yunhan Zhao, Xiang Zheng et al. · Fudan University · City University of Hong Kong +1 more

Benchmarks adversarial and backdoor attacks on robotic VLA models; introduces BackdoorVLA for precise long-horizon targeted manipulation with 100% success on select tasks

Input Manipulation Attack Model Poisoning visionmultimodalreinforcement-learning

1 citations PDF

attack arXiv Feb 1, 2026 · 9w ago

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Kaiyuan Cui, Yige Li, Yutao Wu et al. · The University of Melbourne · Singapore Management University +2 more

Adversarial image attack jailbreaks VLMs with universal cross-target and cross-model transferability using a single surrogate model

Input Manipulation Attack Prompt Injection visionnlpmultimodal

PDF Code

benchmark arXiv Nov 24, 2025 · Nov 2025

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

Juncheng Li, Yige Li, Hanxun Huang et al. · Fudan University · Singapore Management University +1 more

Benchmarks backdoor attacks on VLMs, finding text triggers achieve 90%+ success at just 1% poisoning rate

Model Poisoning visionnlpmultimodal

PDF Code

tool arXiv Sep 26, 2025 · Sep 2025

Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing

Zhe Li, Wei Zhao, Yige Li et al. · Singapore Management University

Diagnostic tool tracing undesirable LLM behaviors to specific training samples via representation gradient analysis in activation space

Model Poisoning Data Poisoning Attack nlp

PDF Code

attack arXiv Jan 29, 2026 · 9w ago

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Xiang Zheng, Yutao Wu, Hanxun Huang et al. · City University of Hong Kong · Deakin University +4 more

Self-evolving agent framework extracts hidden system prompts from 41 commercial LLMs using UCB-guided natural language probing strategies

Sensitive Information Disclosure Prompt Injection nlp

PDF

Papers in Database (8)