Yutao Wu

benchmark arXiv Jan 8, 2026 · 12w ago

Yunhao Feng, Yige Li, Yutao Wu et al. · Fudan University · Alibaba Group +4 more

Benchmark framework systematizing backdoor attacks across planning, memory, and tool-use stages of LLM agent workflows

Model Poisoning Excessive Agency nlpmultimodal

1 citations PDF Code

benchmark arXiv Jan 15, 2026 · 11w ago

Xingjun Ma, Yixu Wang, Hengyuan Xu et al. · Fudan University · Shanghai Innovation Institute +2 more

Benchmarks six frontier LLMs/VLMs on adversarial, multilingual, and compliance safety, revealing all collapse below 6% worst-case safety rates

Prompt Injection nlpmultimodalvisiongenerative

1 citations PDF

attack arXiv Oct 11, 2025 · Oct 2025

Yutao Wu, Xiao Liu, Yinghui Li et al. · Deakin University · Fudan University +1 more

Poisons RAG knowledge bases with few adversarial documents to flip LLM fact-checking decisions at 86% ASR, black-box and transfer-robust.

Data Poisoning Attack Prompt Injection nlp

attack arXiv Jan 29, 2026 · 9w ago

Xiang Zheng, Yutao Wu, Hanxun Huang et al. · City University of Hong Kong · Deakin University +4 more

Self-evolving agent framework extracts hidden system prompts from 41 commercial LLMs using UCB-guided natural language probing strategies

Sensitive Information Disclosure Prompt Injection nlp

attack arXiv Feb 1, 2026 · 9w ago

Kaiyuan Cui, Yige Li, Yutao Wu et al. · The University of Melbourne · Singapore Management University +2 more

Adversarial image attack jailbreaks VLMs with universal cross-target and cross-model transferability using a single surrogate model

Input Manipulation Attack Prompt Injection visionnlpmultimodal

Papers in Database (5)