Latest papers

32 papers
defense arXiv Mar 25, 2026 · 12d ago

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Hongyi Miao, Jun Jia, Xincheng Wang et al. · Shandong University · Shanghai Jiao Tong University +4 more

Data poisoning defense that protects private photo datasets from VLM fine-tuning attacks that extract identity-affiliation relationships

Data Poisoning Attack Sensitive Information Disclosure visionnlpmultimodal
PDF
attack arXiv Mar 24, 2026 · 13d ago

Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs

Wenyu Chen, Xiangtao Meng, Chuanchao Zang et al. · Shandong University

Token-aware jailbreak fuzzing that achieves 90% attack success with 70% fewer queries by prioritizing high-contribution tokens

Prompt Injection nlp
PDF
attack arXiv Mar 23, 2026 · 14d ago

Thermal Topology Collapse: Universal Physical Patch Attacks on Infrared Vision Systems

Chengyin Hu, Yikun Guo, Yuxian Dong et al. · China University of Petroleum-Beijing · University of Electronic Science and Technology of China +3 more

Universal adversarial patch attack on infrared pedestrian detectors using parameterized Bézier curves and cold patches

Input Manipulation Attack vision
PDF
attack arXiv Mar 20, 2026 · 17d ago

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu et al. · Shandong University

Text-only backdoor attack on graph neural networks that poisons node text while preserving graph structure, achieving near-perfect attack success rates

Model Poisoning Data Poisoning Attack nlpgraph
PDF
attack arXiv Mar 18, 2026 · 19d ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative
PDF
defense arXiv Mar 14, 2026 · 23d ago

Towards Generalizable Deepfake Detection via Real Distribution Bias Correction

Ming-Hui Liu, Harry Cheng, Xin Luo et al. · Shandong University · National University of Singapore

Deepfake detector exploiting real image distribution invariance to generalize across unseen forgery types and domains

Output Integrity Attack vision
PDF
defense arXiv Mar 11, 2026 · 26d ago

Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

Zhengyang Shan, Jiayun Xin, Yue Zhang et al. · Shandong University

Analyzes LLM code agent vulnerabilities via 47 attack scenarios, then defends with Human-in-the-Loop tool-call interception raising defense rates from 17% to 92%

Prompt Injection Excessive Agency nlp
PDF Code
benchmark arXiv Mar 8, 2026 · 29d ago

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Yuhang Huang, Boyang Ma, Biwei Yan et al. · Shandong University · City University of Hong Kong

Large-scale empirical analysis reveals MCP servers fail to authenticate callers, enabling unauthorized tool access in LLM agent systems

Insecure Plugin Design nlp
PDF
attack arXiv Feb 11, 2026 · 7w ago

When Skills Lie: Hidden-Comment Injection in LLM Agents

Qianli Wang, Boyang Ma, Minghui Xu et al. · Shandong University

Demonstrates hidden-comment prompt injection in LLM agent Skill documents, invisible to humans but followed by models, triggering malicious tool calls

Prompt Injection Insecure Plugin Design nlp
PDF
benchmark arXiv Feb 3, 2026 · 8w ago

Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions

Zhihao Li, Boyang Ma, Xuelong Dai et al. · Shandong University

Measures description-code inconsistency across 10,240 MCP servers, finding 13% enable undocumented privileged or unauthorized actions by LLM agents

Insecure Plugin Design nlp
PDF
attack arXiv Jan 29, 2026 · 9w ago

ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

Ningyuan He, Ronghong Huang, Qianqian Tang et al. · University of Science and Technology of China · Shandong University +1 more

Zero-query black-box text attacks evade LLM-based in-context learning classifiers with 95.3% success, plus joint defense recipe

Prompt Injection nlp
PDF Code
defense arXiv Jan 29, 2026 · 9w ago

TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

Mingzu Liu, Hao Fang, Runmin Cong · Shandong University · Key Laboratory of Machine Intelligence and System Control

Defends MLLMs against fine-tuning backdoors by detecting attention allocation divergence across instruction, vision, and query components unsupervisedly

Model Poisoning visionnlpmultimodal
PDF Code
attack arXiv Jan 27, 2026 · 9w ago

GraphDLG: Exploring Deep Leakage from Gradients in Federated Graph Learning

Shuyue Wei, Wantong Chen, Tongyu Wei et al. · Shandong University · Beihang University +1 more

Gradient inversion attack on federated graph learning recovers private graph structure and node features from shared gradients via a closed-form recursive rule

Model Inversion Attack graphfederated-learning
PDF
attack arXiv Jan 16, 2026 · 11w ago

VidLeaks: Membership Inference Attacks Against Text-to-Video Models

Li Wang, Wenyu Chen, Ning Yu et al. · Shandong University · State Key Laboratory of Cryptography and Digital Economy Security +2 more

First MIA framework against text-to-video models exploiting sparse keyframe memorization and temporal consistency signals to infer training membership

Membership Inference Attack visiongenerative
PDF Code
attack arXiv Jan 2, 2026 · Jan 2026

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

Yueyan Dong, Minghui Xu, Qin Hu et al. · Shandong University · Guangdong University of Finance and Economics +2 more

Exploits LoRA's decoupled A/B matrix aggregation in federated LLM fine-tuning to inject stealthy malicious updates that degrade model quality while evading anomaly detectors

Data Poisoning Attack Transfer Learning Attack nlpfederated-learning
PDF
attack IACR ePrint Dec 19, 2025 · Dec 2025

Cryptanalysis of Pseudorandom Error-Correcting Codes

Tianrui Wang, Anyu Wang, Tianshuo Cong et al. · Tsinghua University · Shandong University

Cryptanalytic attacks break PRC-based AI content watermarks in 2^22 operations, validated against DeepSeek and Stable Diffusion

Output Integrity Attack nlpgenerativevision
PDF
tool arXiv Dec 13, 2025 · Dec 2025

UniMark: Artificial Intelligence Generated Content Identification Toolkit

Meilin Li, Ji He, Yi Yu et al. · Shanghai AI Laboratory · Shandong University +1 more

Unified open-source toolkit for multimodal AIGC governance via hidden watermarking and visible compliance marking

Output Integrity Attack multimodalnlpvisionaudio
PDF Code
benchmark arXiv Dec 6, 2025 · Dec 2025

Beyond Model Jailbreak: Systematic Dissection of the "Ten DeadlySins" in Embodied Intelligence

Yuhang Huang, Junchao Li, Boyang Ma et al. · Shandong University · City University of Hong Kong

First holistic security audit of an LLM-powered robot platform reveals ten cross-layer vulnerabilities including multilingual LLM safety bypass and full physical hijack

Prompt Injection Excessive Agency multimodalnlp
PDF
defense arXiv Nov 25, 2025 · Nov 2025

DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models

Jun Jia, Hongyi Miao, Yingjie Zhou et al. · Shanghai Jiao Tong University · Shandong University +2 more

Defends facial images from diffusion model customization by adding dual-layer adversarial perturbations that disrupt both fine-tuning and zero-shot identity generation

Output Integrity Attack visiongenerative
PDF
defense arXiv Nov 25, 2025 · Nov 2025

Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation

Jun Jia, Hongyi Miao, Yingjie Zhou et al. · Shandong University · Shanghai Jiao Tong University +2 more

Adversarial perturbation defense that disrupts zero-shot diffusion generation of faces and styles while permitting authenticated access via reversible embedding encryption

Input Manipulation Attack Output Integrity Attack visiongenerative
PDF
Loading more papers…