Latest papers

34 papers
benchmark arXiv Apr 29, 2026 · 22d ago

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Wenhao Lan, Shan Li, Junbin Yang et al. · University of Chinese Academy of Sciences · Inner Mongolia University of Technology +1 more

Mechanistic analysis showing adversarial fine-tuning reorganizes LLM refusal representations across layers while navigating robustness-utility tradeoffs

Prompt Injection nlp
PDF
defense arXiv Apr 11, 2026 · 5w ago

PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

Guangyu Gong, Zizhuang Deng · Shandong University

Training-free defense isolating agent planning from retrieved content to block indirect prompt injection with zero attack success

Prompt Injection nlp
PDF Code
defense arXiv Mar 25, 2026 · 8w ago

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Hongyi Miao, Jun Jia, Xincheng Wang et al. · Shandong University · Shanghai Jiao Tong University +4 more

Data poisoning defense that protects private photo datasets from VLM fine-tuning attacks that extract identity-affiliation relationships

Data Poisoning Attack Sensitive Information Disclosure visionnlpmultimodal
PDF
attack arXiv Mar 24, 2026 · 8w ago

Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs

Wenyu Chen, Xiangtao Meng, Chuanchao Zang et al. · Shandong University

Token-aware jailbreak fuzzing that achieves 90% attack success with 70% fewer queries by prioritizing high-contribution tokens

Prompt Injection nlp
PDF
attack arXiv Mar 23, 2026 · 8w ago

Thermal Topology Collapse: Universal Physical Patch Attacks on Infrared Vision Systems

Chengyin Hu, Yikun Guo, Yuxian Dong et al. · China University of Petroleum-Beijing · University of Electronic Science and Technology of China +3 more

Universal adversarial patch attack on infrared pedestrian detectors using parameterized Bézier curves and cold patches

Input Manipulation Attack vision
PDF
attack arXiv Mar 20, 2026 · 8w ago

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu et al. · Shandong University

Text-only backdoor attack on graph neural networks that poisons node text while preserving graph structure, achieving near-perfect attack success rates

Model Poisoning Data Poisoning Attack nlpgraph
PDF
attack arXiv Mar 18, 2026 · 9w ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative
PDF
defense arXiv Mar 14, 2026 · 9w ago

Towards Generalizable Deepfake Detection via Real Distribution Bias Correction

Ming-Hui Liu, Harry Cheng, Xin Luo et al. · Shandong University · National University of Singapore

Deepfake detector exploiting real image distribution invariance to generalize across unseen forgery types and domains

Output Integrity Attack vision
PDF
defense arXiv Mar 11, 2026 · 10w ago

Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

Zhengyang Shan, Jiayun Xin, Yue Zhang et al. · Shandong University

Analyzes LLM code agent vulnerabilities via 47 attack scenarios, then defends with Human-in-the-Loop tool-call interception raising defense rates from 17% to 92%

Prompt Injection Excessive Agency nlp
PDF Code
benchmark arXiv Mar 8, 2026 · 10w ago

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Yuhang Huang, Boyang Ma, Biwei Yan et al. · Shandong University · City University of Hong Kong

Large-scale empirical analysis reveals MCP servers fail to authenticate callers, enabling unauthorized tool access in LLM agent systems

Insecure Plugin Design nlp
PDF
attack arXiv Feb 11, 2026 · Feb 2026

When Skills Lie: Hidden-Comment Injection in LLM Agents

Qianli Wang, Boyang Ma, Minghui Xu et al. · Shandong University

Demonstrates hidden-comment prompt injection in LLM agent Skill documents, invisible to humans but followed by models, triggering malicious tool calls

Prompt Injection Insecure Plugin Design nlp
PDF
benchmark arXiv Feb 3, 2026 · Feb 2026

Don't believe everything you read: Understanding and Measuring MCP Behavior under Misleading Tool Descriptions

Zhihao Li, Boyang Ma, Xuelong Dai et al. · Shandong University

Measures description-code inconsistency across 10,240 MCP servers, finding 13% enable undocumented privileged or unauthorized actions by LLM agents

Insecure Plugin Design nlp
PDF
attack arXiv Jan 29, 2026 · Jan 2026

ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

Ningyuan He, Ronghong Huang, Qianqian Tang et al. · University of Science and Technology of China · Shandong University +1 more

Zero-query black-box text attacks evade LLM-based in-context learning classifiers with 95.3% success, plus joint defense recipe

Prompt Injection nlp
PDF Code
defense arXiv Jan 29, 2026 · Jan 2026

TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

Mingzu Liu, Hao Fang, Runmin Cong · Shandong University · Key Laboratory of Machine Intelligence and System Control

Defends MLLMs against fine-tuning backdoors by detecting attention allocation divergence across instruction, vision, and query components unsupervisedly

Model Poisoning visionnlpmultimodal
PDF Code
attack arXiv Jan 27, 2026 · Jan 2026

GraphDLG: Exploring Deep Leakage from Gradients in Federated Graph Learning

Shuyue Wei, Wantong Chen, Tongyu Wei et al. · Shandong University · Beihang University +1 more

Gradient inversion attack on federated graph learning recovers private graph structure and node features from shared gradients via a closed-form recursive rule

Model Inversion Attack graphfederated-learning
PDF
attack arXiv Jan 16, 2026 · Jan 2026

VidLeaks: Membership Inference Attacks Against Text-to-Video Models

Li Wang, Wenyu Chen, Ning Yu et al. · Shandong University · State Key Laboratory of Cryptography and Digital Economy Security +2 more

First MIA framework against text-to-video models exploiting sparse keyframe memorization and temporal consistency signals to infer training membership

Membership Inference Attack visiongenerative
PDF Code
attack arXiv Jan 2, 2026 · Jan 2026

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

Yueyan Dong, Minghui Xu, Qin Hu et al. · Shandong University · Guangdong University of Finance and Economics +2 more

Exploits LoRA's decoupled A/B matrix aggregation in federated LLM fine-tuning to inject stealthy malicious updates that degrade model quality while evading anomaly detectors

Data Poisoning Attack Transfer Learning Attack nlpfederated-learning
PDF
attack IACR ePrint Dec 19, 2025 · Dec 2025

Cryptanalysis of Pseudorandom Error-Correcting Codes

Tianrui Wang, Anyu Wang, Tianshuo Cong et al. · Tsinghua University · Shandong University

Cryptanalytic attacks break PRC-based AI content watermarks in 2^22 operations, validated against DeepSeek and Stable Diffusion

Output Integrity Attack nlpgenerativevision
PDF
tool arXiv Dec 13, 2025 · Dec 2025

UniMark: Artificial Intelligence Generated Content Identification Toolkit

Meilin Li, Ji He, Yi Yu et al. · Shanghai AI Laboratory · Shandong University +1 more

Unified open-source toolkit for multimodal AIGC governance via hidden watermarking and visible compliance marking

Output Integrity Attack multimodalnlpvisionaudio
PDF Code
benchmark arXiv Dec 6, 2025 · Dec 2025

Beyond Model Jailbreak: Systematic Dissection of the "Ten DeadlySins" in Embodied Intelligence

Yuhang Huang, Junchao Li, Boyang Ma et al. · Shandong University · City University of Hong Kong

First holistic security audit of an LLM-powered robot platform reveals ten cross-layer vulnerabilities including multilingual LLM safety bypass and full physical hijack

Prompt Injection Excessive Agency multimodalnlp
PDF
Loading more papers…