Latest papers

10 papers
attack arXiv Mar 22, 2026 · 17d ago

Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

Zihui Chen, Yuling Wang, Pengfei Jiao et al. · Hangzhou Dianzi University · Beihang University +1 more

LLM-driven universal adversarial attack framework targeting text-attributed graph models across GNN and PLM architectures

Input Manipulation Attack nlpgraph
PDF
defense arXiv Jan 13, 2026 · 12w ago

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

Zhenhua Xu, Yiran Zhao, Mengting Zhong et al. · Zhejiang University · Binjiang Institute of Zhejiang University +3 more

Hierarchical backdoor fingerprinting embeds nested stylistic and semantic triggers in LLMs to prove ownership against black-box theft

Model Theft Model Theft nlp
3 citations PDF Code
attack arXiv Dec 2, 2025 · Dec 2025

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou et al. · Beijing University of Posts and Telecommunications · Hangzhou Dianzi University +4 more

LeechHijack backdoors MCP tools to covertly parasitize LLM agent compute via runtime C2 channel, achieving 77% success undetected

Insecure Plugin Design nlp
1 citations PDF
defense arXiv Nov 13, 2025 · Nov 2025

DP-GENG : Differentially Private Dataset Distillation Guided by DP-Generated Data

Shuo Shi, Jinghuai Zhang, Shijie Jiang et al. · Zhejiang University · University of California +2 more

Defends dataset distillation against membership inference attacks using DP-generated data initialization and DP-feature matching with formal privacy guarantees.

Membership Inference Attack vision
PDF
defense CCS Nov 11, 2025 · Nov 2025

Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement

Jianan Ma, Jingyi Wang, Qi Xuan et al. · Hangzhou Dianzi University · Zhejiang University +1 more

Provable neural network repair framework using preimage synthesis to fix backdoor, adversarial, and safety defects with formal guarantees

Model Poisoning Input Manipulation Attack vision
PDF Code
attack arXiv Nov 6, 2025 · Nov 2025

Black-Box Guardrail Reverse-engineering Attack

Hongwei Yao, Yun Xia, Shuo Shao et al. · City University of Hong Kong · Hangzhou Dianzi University +1 more

Clones black-box LLM guardrail policies via RL and genetic algorithms, achieving 0.92 fidelity for under $85 in API queries

Model Theft Model Theft nlp
PDF
attack arXiv Oct 28, 2025 · Oct 2025

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts

Yufan Liu, Wanqian Zhang, Huashan Chen et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Black-box LLM-driven attack generates human-readable adversarial prompts that bypass T2I safety filters with 1000x speedup

Input Manipulation Attack visionnlpgenerative
2 citations PDF
defense arXiv Aug 8, 2025 · Aug 2025

Quantifying Conversation Drift in MCP via Latent Polytope

Haoran Shi, Hongwei Yao, Shuo Shao et al. · arXiv · Zhejiang University +3 more

Defends LLM-MCP tool integrations against indirect prompt injection by detecting adversarial conversation drift in latent polytope space

Insecure Plugin Design Prompt Injection nlp
PDF
attack arXiv Aug 7, 2025 · Aug 2025

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

Qi Guo, Xiaojun Jia, Shanmin Pang et al. · Xi’an Jiaotong University · A*STAR +4 more

Physical adversarial patch attack on MLLM-based autonomous driving using SVD alignment and semantic mask optimization to steer perception and planning outputs

Input Manipulation Attack Prompt Injection visionmultimodal
PDF
defense ICASSP Jan 1, 2025 · Jan 2025

Knowledge-Guided Prompt Learning for Deepfake Facial Image Detection

Hao Wang, Cheng Deng, Zhidong Zhao · Hangzhou Dianzi University · Xidian University

Detects deepfake facial images by grounding CLIP prompts with LLM-retrieved forgery knowledge and test-time domain adaptation

Output Integrity Attack visionnlp
PDF