Latest papers

11 papers
attack arXiv Mar 25, 2026 · 12d ago

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Yulin Shen, Xudong Pan, Geng Hong et al. · Fudan University · Shanghai Innovation Institute

Black-box tree-search attack generating stealthy injection payloads that hijack MCP-enabled LLM agents through manipulated tool responses

Prompt Injection Insecure Plugin Design nlp
PDF
survey arXiv Mar 2, 2026 · 5w ago

From Secure Agentic AI to Secure Agentic Web: Challenges, Threats, and Future Directions

Zhihang Deng, Jiaping Gui, Weinan Zhang · Shanghai Innovation Institute · Shanghai Jiao Tong University

Surveys prompt injection, toolchain abuse, and agent network threats across LLM agentic systems and web-scale deployments

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Jan 21, 2026 · 10w ago

INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems

Yijin Zhou, Xiaoya Lu, Dongrui Liu et al. · Shanghai Jiao Tong University · Shanghai Artificial Intelligence Laboratory +1 more

Defends LLM multi-agent systems from viral malicious propagation by detecting and rehabilitating infected agents with topological constraints

Prompt Injection Excessive Agency nlp
PDF Code
defense arXiv Jan 19, 2026 · 11w ago

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Wenqi Zhang, Yulin Shen, Changyue Jiang et al. · Fudan University · Shanghai Innovation Institute

Defends LLM computer-use agents against prompt/visual injection by training on simulated unsafe GUI trajectories to correct reasoning chains

Prompt Injection Excessive Agency nlpvisionmultimodal
PDF Code
benchmark arXiv Jan 15, 2026 · 11w ago

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Xingjun Ma, Yixu Wang, Hengyuan Xu et al. · Fudan University · Shanghai Innovation Institute +2 more

Benchmarks six frontier LLMs/VLMs on adversarial, multilingual, and compliance safety, revealing all collapse below 6% worst-case safety rates

Prompt Injection nlpmultimodalvisiongenerative
1 citations PDF
benchmark arXiv Jan 13, 2026 · 11w ago

WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web Agents

Xinyi Wu, Jiagui Chen, Geng Hong et al. · Fudan University · Shanghai Innovation Institute

Automated benchmark with 1,226 tasks evaluating LLM web agent security across prompt injection and excessive agency risks

Prompt Injection Excessive Agency nlp
PDF Code
defense arXiv Jan 12, 2026 · 12w ago

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Xinyi Wu, Geng Hong, Yueyue Chen et al. · Fudan University · Zhongguancun Laboratory +2 more

Discovers social engineering attacks hijack LLM web agents via malicious webpage content; proposes runtime defense reducing attack success by 78%

Prompt Injection Excessive Agency nlp
1 citations PDF
benchmark arXiv Jan 8, 2026 · 12w ago

BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents

Yunhao Feng, Yige Li, Yutao Wu et al. · Fudan University · Alibaba Group +4 more

Benchmark framework systematizing backdoor attacks across planning, memory, and tool-use stages of LLM agent workflows

Model Poisoning Excessive Agency nlpmultimodal
1 citations PDF Code
defense arXiv Nov 29, 2025 · Nov 2025

SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning

Yongkang Hu, Yu Cheng, Yushuo Zhang et al. · East China Normal University · Shanghai Innovation Institute

Continual-learning detection framework for AI-generated images using scene-aware expert modules and gradient-projection to prevent forgetting

Output Integrity Attack vision
PDF
tool arXiv Oct 10, 2025 · Oct 2025

Provable Training Data Identification for Large Language Models

Zhenlong Liu, Hao Zeng, Weiran Huang et al. · Southern University of Science and Technology · Shanghai Innovation Institute +1 more

Set-level membership inference for LLMs with provable false identification rate control via conformal p-values and BH procedure

Membership Inference Attack nlp
PDF
defense arXiv Oct 8, 2025 · Oct 2025

AWM: Accurate Weight-Matrix Fingerprint for Large Language Models

Boyi Zeng, Lin Chen, Ziwei He et al. · Shanghai Jiao Tong University · Fudan University +1 more

Training-free LLM weight-matrix fingerprinting detects model lineage with perfect AUC, robust to six post-training modification types

Model Theft Model Theft nlp
PDF Code