Latest papers

35 papers
attack arXiv Apr 2, 2026 · 4d ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp
PDF Code
benchmark arXiv Mar 30, 2026 · 7d ago

Evaluating Privilege Usage of Agents on Real-World Tools

Quan Zhang, Lianhang Fu, Lvsi Lian et al. · East China Normal University · Xinjiang University +1 more

Benchmark evaluating LLM agents' privilege control under prompt injection attacks using real-world tools, finding 84.80% attack success

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 25, 2026 · 12d ago

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

Jie Song, Jun Jia, Wei Sun et al. · Macao Polytechnic University · Shanghai Jiao Tong University +2 more

Medical image fusion model embedding visible copyright watermarks in outputs, removable only with authentication keys

Model Theft Output Integrity Attack visionmultimodal
PDF
defense arXiv Mar 25, 2026 · 12d ago

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Hongyi Miao, Jun Jia, Xincheng Wang et al. · Shandong University · Shanghai Jiao Tong University +4 more

Data poisoning defense that protects private photo datasets from VLM fine-tuning attacks that extract identity-affiliation relationships

Data Poisoning Attack Sensitive Information Disclosure visionnlpmultimodal
PDF
defense arXiv Feb 10, 2026 · 7w ago

AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models

Yue Li, Xin Yi, Dongsheng Shi et al. · East China Normal University · Hasso Plattner Institute +1 more

Attention-guided dynamic watermarking for LVLM outputs that preserves visual fidelity while achieving 99.36% AUC detection accuracy

Output Integrity Attack nlpmultimodalvision
PDF
attack arXiv Feb 7, 2026 · 8w ago

Reverse-Engineering Model Editing on Language Models

Zhiyu Sun, Minrui Luo, Yu Wang et al. · Shanghai Qi Zhi Institute · East China Normal University +3 more

Recovers private edited data from LLM parameter update matrices using spectral analysis and entropy-based prompt reconstruction

Model Inversion Attack Sensitive Information Disclosure nlp
PDF Code
defense arXiv Jan 30, 2026 · 9w ago

VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection

Weizhi Liu, Yue Li, Zhaoxia Yin · East China Normal University · Huaqiao University

Injects adapter parameters into speech vocoders to embed robust, high-fidelity watermarks in AI-generated audio for provenance tracking

Output Integrity Attack audiogenerative
PDF
defense arXiv Jan 19, 2026 · 11w ago

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion

Guanghao Zhou, Panjia Qiu, Cen Chen et al. · East China Normal University · Ant Group

Post-hoc LLM safety re-alignment via low-rank safety subspace fusion to restore guardrails degraded by fine-tuning

Transfer Learning Attack Prompt Injection nlp
3 citations 1 influentialPDF
defense arXiv Dec 20, 2025 · Dec 2025

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

Yucheng Fan, Jiawei Chen, Yu Tian et al. · East China Normal University · Zhongguancun Academy +1 more

Adversarial image perturbations shield social-media photos from VLM-based private attribute inference while preserving visual quality

Input Manipulation Attack visionmultimodal
PDF
benchmark arXiv Dec 17, 2025 · Dec 2025

MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

Xuanjun Zong, Zhiqi Shen, Lei Wang et al. · East China Normal University · Salesforce AI Research +2 more

Benchmark of 20 MCP attack types across 5 real-world domains revealing escalating LLM agent safety gaps in multi-step tool-use workflows

Insecure Plugin Design Excessive Agency nlp
4 citations PDF Code
defense TIFS Dec 17, 2025 · Dec 2025

ArcGen: Generalizing Neural Backdoor Detection Across Diverse Architectures

Zhonghao Yang, Cheng Luo, Daojing He et al. · East China Normal University · Harbin Institute of Technology +2 more

Defends against backdoor attacks by learning architecture-invariant model features for robust detection across unseen model architectures

Model Poisoning vision
PDF Code
defense arXiv Nov 29, 2025 · Nov 2025

SAIDO: Generalizable Detection of AI-Generated Images via Scene-Aware and Importance-Guided Dynamic Optimization in Continual Learning

Yongkang Hu, Yu Cheng, Yushuo Zhang et al. · East China Normal University · Shanghai Innovation Institute

Continual-learning detection framework for AI-generated images using scene-aware expert modules and gradient-projection to prevent forgetting

Output Integrity Attack vision
PDF
defense arXiv Nov 25, 2025 · Nov 2025

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Jun Jia, Hongyi Miao, Yingjie Zhou et al. · Shanghai Jiao Tong University · Shandong University +2 more

Defends facial images from diffusion model customization by adding dual-layer adversarial perturbations that disrupt both fine-tuning and zero-shot identity generation

Output Integrity Attack visiongenerative
PDF
defense arXiv Nov 25, 2025 · Nov 2025

Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation

Jun Jia, Hongyi Miao, Yingjie Zhou et al. · Shandong University · Shanghai Jiao Tong University +2 more

Adversarial perturbation defense that disrupts zero-shot diffusion generation of faces and styles while permitting authenticated access via reversible embedding encryption

Input Manipulation Attack Output Integrity Attack visiongenerative
PDF
benchmark arXiv Nov 24, 2025 · Nov 2025

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Xincheng Wang, Hanchi Sun, Wenjun Sun et al. · Donghua University · Shanghai Jiaotong University +3 more

Benchmarks dataset watermarking schemes for diffusion model traceability and proposes a removal attack that fully defeats them

Output Integrity Attack visiongenerative
PDF
attack arXiv Nov 20, 2025 · Nov 2025

"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios

Zhen Sun, Zongmin Zhang, Deqi Liang et al. · The Hong Kong University of Science and Technology · East China Normal University +5 more

Game-theoretic black-box jailbreak using Prisoner's Dilemma scenarios to flip LLM safety preferences, achieving 95%+ ASR on GPT-4o and DeepSeek-R1

Prompt Injection nlp
2 citations PDF Code
defense arXiv Nov 18, 2025 · Nov 2025

Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education

Xin Yi, Yue Li, Dongsheng Shi et al. · East China Normal University

Three-stage defense framework for educational LLMs that resists both jailbreak and fine-tuning safety-removal attacks

Transfer Learning Attack Prompt Injection nlp
1 citations PDF
defense arXiv Nov 18, 2025 · Nov 2025

Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection

Zhengchunmin Dai, Jiaxiong Tang, Peng Sun et al. · East China Normal University · Hunan University +1 more

Embeds ownership watermarks into client models via server-side gradient injection in split federated learning to defend against model theft

Model Theft visionfederated-learning
PDF
defense arXiv Nov 17, 2025 · Nov 2025

Robust Client-Server Watermarking for Split Federated Learning

Jiaxiong Tang, Zhengchunmin Dai, Liantao Wu et al. · East China Normal University · Hunan University +1 more

Embeds asymmetric client-server watermarks into split federated learning models to prove joint ownership and resist removal attacks

Model Theft federated-learning
PDF
defense arXiv Nov 16, 2025 · Nov 2025

Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection

Jiayi Zhu, Yihao Huang, Yue Cao et al. · Xidian University · Ltd +5 more

Defends geo-privacy by embedding semantics-aware deceptive text overlays around images to mislead LVLMs into predicting wrong geolocations.

Input Manipulation Attack Prompt Injection visionmultimodal
PDF
Loading more papers…