Latest papers

53 papers
defense arXiv Mar 26, 2026 · 13d ago

Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

Shiji Zhao, Shukun Xiong, Maoxun Yuan et al. · Beihang University · Alibaba Group +2 more

Adversarial training for infrared object detectors guided by thermal radiation physics to improve robustness against attacks and corruptions

Input Manipulation Attack vision
PDF
defense arXiv Mar 25, 2026 · 14d ago

Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Jipeng Liu, Haichao Shi, Siyu Xing et al. · Chinese Academy of Sciences · Beihang University

Addresses optimization collapse in VLM-based deepfake detectors through gradient signal enhancement and contrastive regional injection for cross-domain generalization

Output Integrity Attack visionmultimodal
PDF
attack arXiv Mar 22, 2026 · 17d ago

Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

Zihui Chen, Yuling Wang, Pengfei Jiao et al. · Hangzhou Dianzi University · Beihang University +1 more

LLM-driven universal adversarial attack framework targeting text-attributed graph models across GNN and PLM architectures

Input Manipulation Attack nlpgraph
PDF
attack arXiv Mar 20, 2026 · 19d ago

CAMA: Exploring Collusive Adversarial Attacks in c-MARL

Men Niu, Xinxin Fan, Quanliang Jing et al. · Institute of Computing Technology · University of Chinese Academy of Sciences +1 more

Introduces three collusive policy-level attacks on cooperative MARL where multiple malicious agents coordinate to disrupt teamwork

Input Manipulation Attack reinforcement-learning
PDF
survey arXiv Mar 13, 2026 · 26d ago

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu et al. · Beihang University · Zhongguancun Laboratory +1 more

Security analysis of OpenClaw autonomous agents revealing prompt injection RCE, tool chain attacks, and proposing FASA defense architecture

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF Code
attack arXiv Mar 10, 2026 · 29d ago

Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models

Quanchen Zou, Moyang Chen, Zonghao Ying et al. · 360 AI Security Lab · Wenzhou-Kean University +1 more

Jailbreaks VLMs by chaining semantically benign visual gadgets via prompt-controlled reasoning to synthesize harmful outputs, bypassing perception-level alignment

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
attack arXiv Mar 7, 2026 · 4w ago

Two Frames Matter: A Temporal Attack for Text-to-Video Model Jailbreaking

Moyang Chen, Zonghao Ying, Wenzhuo Xu et al. · Wenzhou-Kean University · 360 AI Security Lab +1 more

Jailbreaks text-to-video models by exploiting temporal infilling: sparse boundary-frame prompts induce harmful intermediate content generation

Prompt Injection multimodalgenerative
PDF
attack arXiv Mar 5, 2026 · 4w ago

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang et al. · Beihang University · Tsinghua University

Detects LLM pre-training data via gradient deviation scores capturing update magnitude, location, and concentration in FFN/Attention modules

Membership Inference Attack nlp
PDF
defense arXiv Mar 3, 2026 · 5w ago

StegaFFD: Privacy-Preserving Face Forgery Detection via Fine-Grained Steganographic Domain Lifting

Guoqing Ma, Xun Lin, Hui Ma et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more

Steganographic framework hides faces in cover images and detects deepfakes directly in the hidden domain to prevent facial privacy leakage

Output Integrity Attack vision
PDF
attack arXiv Feb 26, 2026 · 5w ago

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia et al. · Nanyang Technological University · BraneMatrix AI +7 more

Bio-inspired optimization generates classical Chinese jailbreak prompts that defeat modern-language safety guardrails in black-box LLMs

Prompt Injection nlp
PDF
defense arXiv Jan 29, 2026 · 9w ago

Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs

Jun Xue, Yi Chai, Yanzhen Ren et al. · Wuhan University · Independent Researcher +3 more

Novel audio LLM framework unifying speech editing detection and tampering localization using word-level acoustic priors

Output Integrity Attack audionlp
1 citations PDF
attack arXiv Jan 27, 2026 · 10w ago

GraphDLG: Exploring Deep Leakage from Gradients in Federated Graph Learning

Shuyue Wei, Wantong Chen, Tongyu Wei et al. · Shandong University · Beihang University +1 more

Gradient inversion attack on federated graph learning recovers private graph structure and node features from shared gradients via a closed-form recursive rule

Model Inversion Attack graphfederated-learning
PDF
attack arXiv Jan 27, 2026 · 10w ago

LLMs Can Unlearn Refusal with Only 1,000 Benign Samples

Yangyang Guo, Ziwei Xu, Si Liu et al. · National University of Singapore · Beihang University

Fine-tunes LLMs on 1,000 benign samples with refusal prefixes to erase safety alignment across 16 models including GPT and Gemini

Transfer Learning Attack Prompt Injection nlp
PDF Code
defense arXiv Jan 15, 2026 · 11w ago

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Hao Wang, Yanting Wang, Hao Li et al. · Beihang University · Peking University +1 more

Defends LLMs against jailbreaks via self-play RL where one model concurrently generates and resists adversarial prompts

Prompt Injection nlp
PDF
attack arXiv Dec 22, 2025 · Dec 2025

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

Jihui Guo, Zongmin Zhang, Zhen Sun et al. · The University of Hong Kong · The Hong Kong University of Science and Technology +2 more

Backdoor attack on 6DoF pose estimation using 3D object triggers to induce controlled erroneous rotations and translations with 100% ASR

Model Poisoning vision
1 citations PDF Code
attack arXiv Dec 16, 2025 · Dec 2025

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Shuxin Zhao, Bo Lang, Nan Xiao et al. · Beihang University · Zhongguancun Laboratory

Backdoor attack on object detectors using inter-object spatial interaction patterns as triggers, enabling multi-trigger-multi-object attacks with 97%+ success in real-world scenes

Model Poisoning vision
PDF
survey arXiv Dec 7, 2025 · Dec 2025

SoK: Trust-Authorization Mismatch in LLM Agent Interactions

Guanquan Shi, Haohua Du, Zhiqiang Wang et al. · Beihang University · University of Science and Technology of China

Surveys 200+ papers on LLM agent security, proposing the B-I-P framework to unify prompt injection, tool poisoning, and authorization-mismatch threats

Prompt Injection Insecure Plugin Design Excessive Agency nlp
2 citations 1 influentialPDF
defense arXiv Dec 5, 2025 · Dec 2025

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

Weikai Lu, Ziqian Zeng, Kehua Zhang et al. · South China University of Technology · Hong Kong University of Science and Technology +2 more

Defends MLLMs against multimodal indirect prompt injection by steering instruction-following behavior in activation space

Prompt Injection multimodalnlp
1 citations PDF
attack arXiv Dec 5, 2025 · Dec 2025

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

Shiji Zhao, Shukun Xiong, Yao Huang et al. · Beihang University · Alibaba Group

Jailbreaks MLLMs by decomposing harmful text into sequential semantically crafted sub-images that aggregate harmful intent across frames

Prompt Injection visionnlpmultimodal
PDF
defense arXiv Nov 30, 2025 · Nov 2025

DyLoC: A Dual-Layer Architecture for Secure and Trainable Quantum Machine Learning Under Polynomial-DLA constraint

Chenyi Zhang, Tao Shang, Chao Guo et al. · Beihang University

Defends quantum variational circuits against gradient-leakage data reconstruction and snapshot inversion attacks while preserving trainability

Model Inversion Attack
PDF
Loading more papers…