Yutao Wu

h-index: 1 20 citations 7 papers (total)

Papers in Database (5)

benchmark arXiv Jan 8, 2026 · 12w ago

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng, Yige Li, Yutao Wu et al. · Fudan University · Alibaba Group +4 more

Benchmark framework systematizing backdoor attacks across planning, memory, and tool-use stages of LLM agent workflows

Model Poisoning Excessive Agency nlpmultimodal
1 citations PDF Code
benchmark arXiv Jan 15, 2026 · 11w ago

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Xingjun Ma, Yixu Wang, Hengyuan Xu et al. · Fudan University · Shanghai Innovation Institute +2 more

Benchmarks six frontier LLMs/VLMs on adversarial, multilingual, and compliance safety, revealing all collapse below 6% worst-case safety rates

Prompt Injection nlpmultimodalvisiongenerative
1 citations PDF
attack arXiv Oct 11, 2025 · Oct 2025

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Yutao Wu, Xiao Liu, Yinghui Li et al. · Deakin University · Fudan University +1 more

Poisons RAG knowledge bases with few adversarial documents to flip LLM fact-checking decisions at 86% ASR, black-box and transfer-robust.

Data Poisoning Attack Prompt Injection nlp
PDF
attack arXiv Jan 29, 2026 · 9w ago

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Xiang Zheng, Yutao Wu, Hanxun Huang et al. · City University of Hong Kong · Deakin University +4 more

Self-evolving agent framework extracts hidden system prompts from 41 commercial LLMs using UCB-guided natural language probing strategies

Sensitive Information Disclosure Prompt Injection nlp
PDF
attack arXiv Feb 1, 2026 · 9w ago

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Kaiyuan Cui, Yige Li, Yutao Wu et al. · The University of Melbourne · Singapore Management University +2 more

Adversarial image attack jailbreaks VLMs with universal cross-target and cross-model transferability using a single surrogate model

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code