Latest papers

72 papers
defense arXiv Apr 2, 2026 · 4d ago

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang, Zhijia Zhao, Bihuan Chen et al. · Fudan University

Constructs dataset of 114 malicious MCP servers exploiting LLM tool-calling and proposes behavioral deviation detector achieving 94.6% F1

Insecure Plugin Design nlp
PDF
attack arXiv Apr 1, 2026 · 5d ago

Adversarial Attenuation Patch Attack for SAR Object Detection

Yiming Zhang, Weibo Qin, Feng Wang · Fudan University

Adversarial patch attack on SAR target detection achieving stealthiness and physical realizability through energy-constrained optimization

Input Manipulation Attack vision
PDF Code
attack arXiv Mar 25, 2026 · 12d ago

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Yulin Shen, Xudong Pan, Geng Hong et al. · Fudan University · Shanghai Innovation Institute

Black-box tree-search attack generating stealthy injection payloads that hijack MCP-enabled LLM agents through manipulated tool responses

Prompt Injection Insecure Plugin Design nlp
PDF
tool arXiv Mar 23, 2026 · 14d ago

VIGIL: Part-Grounded Structured Reasoning for Generalizable Deepfake Detection

Xinghan Li, Junhao Xu, Jingjing Chen · Fudan University

Interpretable deepfake detector using multimodal LLMs with part-grounded forensic reasoning and structured evidence verification

Output Integrity Attack visionmultimodalgenerative
PDF Code
benchmark arXiv Mar 8, 2026 · 29d ago

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Yige Li, Wei Zhao, Zhe Li et al. · Singapore Management University · The University of Melbourne +1 more

Benchmarks beneficial uses of LLM backdoors for safety enforcement, access control, and watermarking via trigger conditioning

Model Poisoning Prompt Injection nlp
PDF Code
attack arXiv Mar 4, 2026 · 4w ago

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Junchen Li, Chao Qi, Rongzheng Wang et al. · University of Electronic Science and Technology of China · Fudan University +1 more

Poisons RAG knowledge bases with alignment-exploiting documents that transfer blocking attacks across 7 LLMs with 96% success

Data Poisoning Attack Prompt Injection nlp
PDF
defense arXiv Mar 2, 2026 · 5w ago

RA-Det: Towards Universal Detection of AI-Generated Images via Robustness Asymmetry

Xinchang Wang, Yunhao Chen, Yuechen Zhang et al. · Jiangnan University · Fudan University

Detects AI-generated images by exploiting feature drift asymmetry between real and synthetic images under structured perturbations

Output Integrity Attack vision
PDF Code
benchmark arXiv Feb 25, 2026 · 5w ago

Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

Zheyuan Gu, Qingsong Zhao, Yusong Wang et al. · China Telecom · Peking University +1 more

Proposes FAQ benchmark to evaluate VLMs on temporal deepfake detection via three-level forensic reasoning hierarchy

Output Integrity Attack visionmultimodal
PDF
defense arXiv Feb 25, 2026 · 5w ago

Leveraging large multimodal models for audio-video deepfake detection: a pilot study

Songjun Cao, Yuqi Li, Yunpeng Luo et al. · Tencent Youtu Lab · Fudan University

Fine-tunes Qwen 2.5 Omni as a unified audio-visual deepfake detector via two-stage LoRA and encoder fine-tuning

Output Integrity Attack multimodalaudiovision
PDF
defense arXiv Feb 10, 2026 · 7w ago

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Kun Wang, Zherui Li, Zhenhong Zhou et al. · Nanyang Technological University · Beijing University of Posts and Telecommunications +4 more

Exposes cross-modal jailbreak vulnerabilities in omni-modal LLMs and defends via SVD-guided refusal vector amplification with lightweight adapters

Prompt Injection multimodalnlp
PDF Code
defense arXiv Feb 4, 2026 · 8w ago

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative
PDF
benchmark arXiv Feb 3, 2026 · 8w ago

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs

Yuxuan Liu, Yuntian Shi, Kun Wang et al. · Zhejiang University · Fudan University +1 more

Benchmark exposing cross-modal safety gaps in 16 VLMs via image-text combinations that bypass or confuse safety alignment

Prompt Injection multimodalvisionnlp
PDF
attack arXiv Feb 3, 2026 · 8w ago

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

Tianxin Chen, Wenbo Jiang, Hongqiao Chen et al. · Fudan University · University of Electronic Science and Technology of China +1 more

Backdoor attack on T2I diffusion models using semantic-space triggers that evade enumeration and attention-consistency defenses with 100% ASR

Model Poisoning visionnlpgenerativemultimodal
PDF
defense arXiv Feb 3, 2026 · 8w ago

SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity Enhancement

Huming Qiu, Mi Zhang, Junjie Sun et al. · Fudan University · Alibaba Group

Defends DNN model ownership watermarks against removal attacks by reducing watermark association with approximate reverse-engineered keys

Model Theft vision
PDF
attack arXiv Feb 1, 2026 · 9w ago

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Kaiyuan Cui, Yige Li, Yutao Wu et al. · The University of Melbourne · Singapore Management University +2 more

Adversarial image attack jailbreaks VLMs with universal cross-target and cross-model transferability using a single surrogate model

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code
defense arXiv Jan 30, 2026 · 9w ago

Color Matters: Demosaicing-Guided Color Correlation Training for Generalizable AI-Generated Image Detection

Nan Zhong, Yiran Xu, Mian Zou · City University of Hong Kong · Fudan University +1 more

Detects AI-generated images via camera CFA color correlations, achieving state-of-the-art generalization across 20+ unseen generators

Output Integrity Attack vision
PDF
attack arXiv Jan 30, 2026 · 9w ago

From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching

Zhixiang Zhang, Zesen Liu, Yuchong Xie et al. · The Hong Kong University of Science and Technology · Fudan University

CacheAttack exploits semantic cache collision vulnerabilities to hijack LLM responses at 86% success rate across major providers

Output Integrity Attack Prompt Injection nlp
PDF
attack arXiv Jan 29, 2026 · 9w ago

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Xiang Zheng, Yutao Wu, Hanxun Huang et al. · City University of Hong Kong · Deakin University +4 more

Self-evolving agent framework extracts hidden system prompts from 41 commercial LLMs using UCB-guided natural language probing strategies

Sensitive Information Disclosure Prompt Injection nlp
PDF
defense arXiv Jan 20, 2026 · 10w ago

SecureSplit: Mitigating Backdoor Attacks in Split Learning

Zhihao Dou, Dongfei Cui, Weida Wang et al. · Case Western Reserve University · Northeast Electric Power University +6 more

Defends split learning against backdoor attacks by transforming embeddings and filtering poisoned ones via majority-voting scheme

Model Poisoning visionfederated-learning
PDF
defense arXiv Jan 19, 2026 · 11w ago

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Wenqi Zhang, Yulin Shen, Changyue Jiang et al. · Fudan University · Shanghai Innovation Institute

Defends LLM computer-use agents against prompt/visual injection by training on simulated unsafe GUI trajectories to correct reasoning chains

Prompt Injection Excessive Agency nlpvisionmultimodal
PDF Code
Loading more papers…