Latest papers

12 papers
survey arXiv Mar 8, 2026 · 4w ago

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Xiaolei Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +5 more

Surveys LLM agent security threats across three autonomy tiers: cognitive manipulation, tool misuse, and multi-agent systemic failures

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 5, 2026 · 4w ago

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Lianyu Wang, Meng Wang, Huazhu Fu et al. · Nanjing University of Aeronautics and Astronautics · Southeast University +1 more

Defends VLM intellectual property via dynamic authorization module restricting deployment to user-specified domains at inference time

Model Theft visionnlpmultimodal
PDF
attack arXiv Jan 24, 2026 · 10w ago

Physical Prompt Injection Attacks on Large Vision-Language Models

Chen Ling, Kai Hu, Hangcheng Liu et al. · Wuhan University · Nanyang Technological University +1 more

Embeds malicious typographic instructions in physical objects to inject prompts into VLMs, achieving up to 98% attack success across 10 models

Input Manipulation Attack Prompt Injection visionmultimodal
PDF Code
defense arXiv Jan 23, 2026 · 10w ago

SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment

Xianya Fang, Xianying Luo, Yadong Wang et al. · Nanjing University of Aeronautics and Astronautics · Tsinghua University +3 more

Adaptive three-stage LLM defense routes inputs by risk level to counter jailbreaks and prefilling attacks without sacrificing utility

Prompt Injection nlp
PDF
attack arXiv Jan 7, 2026 · Jan 2026

State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Ji Guo, Wenbo Jiang, Yansong Lin et al. · University of Electronic Science and Technology of China · Nanyang Technological University +1 more

Backdoor attack on VLA robotics models using robot arm initial state as trigger, achieving >90% attack success rate stealthily

Model Poisoning Data Poisoning Attack visionmultimodal
1 citations PDF
attack arXiv Dec 26, 2025 · Dec 2025

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao et al. · Hong Kong University of Science and Technology · Nanjing University of Aeronautics and Astronautics +2 more

Proposes BadVSFM, a two-stage backdoor attack on prompt-driven video segmentation models where classic backdoors fail (<5% ASR)

Model Poisoning vision
PDF
attack arXiv Nov 17, 2025 · Nov 2025

Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping

Lei Wang, Yulong Tian, Hao Han et al. · Nanjing University of Aeronautics and Astronautics · Nanjing University

Enhances multi-target backdoor attacks by optimizing source-to-target class grouping, achieving up to 28% higher attack success rates while evading defenses

Model Poisoning vision
PDF Code
defense arXiv Oct 31, 2025 · Oct 2025

Rethinking Robust Adversarial Concept Erasure in Diffusion Models

Qinghong Yin, Yu Tian, Heming Yang et al. · Beijing University of Posts and Telecommunications · Tsinghua University +2 more

Semantics-guided adversarial training makes diffusion model concept erasure robust against adversarial prompt bypass attacks, cutting training time 90%

Input Manipulation Attack generativevision
PDF Code
attack arXiv Oct 9, 2025 · Oct 2025

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

Haoran Ou, Kangjie Chen, Xingshuo Han et al. · Nanyang Technological University · Nanjing University of Aeronautics and Astronautics +2 more

Red-teams web-augmented LLMs with benign-looking search queries that bypass safety filters and force harmful content citations

Prompt Injection nlp
1 citations PDF
defense arXiv Sep 14, 2025 · Sep 2025

Make Identity Unextractable yet Perceptible: Synthesis-Based Privacy Protection for Subject Faces in Photos

Tao Wang, Yushu Zhang, Xiangli Xiao et al. · Nanjing University of Aeronautics and Astronautics · Jiangxi University of Finance and Economics +1 more

Synthesis-based anti-face-recognition defense generates perceptible yet identity-unextractable faces to defeat unauthorized FR systems

Input Manipulation Attack visiongenerative
PDF Code
attack arXiv Aug 14, 2025 · Aug 2025

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Chiyu Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +2 more

Novel black-box jailbreak attack combining adversarial context alignment and fake chain-of-thought to bypass reasoning LLM safety guardrails

Prompt Injection nlp
PDF
defense arXiv Aug 5, 2025 · Aug 2025

BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS

Ye Li, Chengcheng Zhu, Yanchao Zhao et al. · Nanjing University of Aeronautics and Astronautics · Nanjing University +1 more

Defends against backdoor attacks in black-box MLaaS by progressively purging HVT, SVT, and LVT triggers at inference time

Model Poisoning vision
PDF