Latest papers

133 papers
defense arXiv Apr 2, 2026 · 4d ago

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Yue Li, Linying Xue, Kaiqing Lin et al. · National Huaqiao University · Shenzhen University +2 more

Diffusion-guided adversarial perturbation defense protecting facial images from deepfake manipulation in both white-box and black-box settings

Input Manipulation Attack visiongenerative
PDF
defense arXiv Apr 1, 2026 · 5d ago

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun, Wanwei Liu, Haoang Chi et al. · National University of Defense Technology · Nanjing University +1 more

Interpretable DNN repair using Shapley-guided fault localization and derivative-free optimization for backdoor removal, adversarial defense, and fairness

Input Manipulation Attack Model Poisoning vision
PDF
attack arXiv Mar 24, 2026 · 13d ago

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu et al. · Zhejiang University · Harbin Institute of Technology

Targeted adversarial patch attack hijacks VLA robotic control by corrupting CoT reasoning to induce specific malicious behaviors

Input Manipulation Attack multimodalvisionnlp
PDF
attack arXiv Mar 22, 2026 · 15d ago

JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization

Haolun Zheng, Yu He, Tailun Chen et al. · Zhejiang University · Hangzhou HighTech Zone (Binjiang) Blockchain and Data Security Research Institute +1 more

Distribution optimization jailbreak attack on T2I models achieving 43% attack success rate bypassing safety filters on Stable Diffusion

Input Manipulation Attack Prompt Injection visiongenerativemultimodal
PDF
benchmark arXiv Mar 21, 2026 · 16d ago

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu et al. · Zhejiang University · Chongqing University +1 more

Measurement study showing FL poisoning attacks are less effective in practice than research suggests due to heterogeneity and stability constraints

Data Poisoning Attack visionnlptabularfederated-learning
PDF Code
defense arXiv Mar 18, 2026 · 19d ago

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Kun Wang, Meng Chen, Junhao Wang et al. · Zhejiang University · Xi’an Jiaotong University +1 more

Black-box backdoor detector for speech models exploiting dual stability anomalies under semantic-breaking and semantic-preserving perturbations

Model Poisoning audio
PDF
defense arXiv Mar 12, 2026 · 25d ago

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai et al. · Shanghai Jiao Tong University · Ant Group +2 more

Embeds per-client backdoor watermarks in federated LMs to trace model leaks to individual culprits via black-box queries

Model Theft Model Poisoning nlpfederated-learningmultimodal
PDF
defense arXiv Mar 11, 2026 · 26d ago

AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations

Yu He, Haozhe Zhu, Yiming Li et al. · Zhejiang University · Nanyang Technological University +1 more

Runtime defense for LLM agents detecting indirect prompt injection via causal counterfactual analysis of tool invocations

Prompt Injection nlp
PDF Code
defense arXiv Mar 10, 2026 · 27d ago

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Chao Shuai, Zhenguang Liu, Shaojing Fan et al. · Zhejiang University · National University of Singapore +1 more

Proposes GSD module to block semantic shortcuts in VFM-based detectors, improving generalization to unseen AI-generated image pipelines

Output Integrity Attack visiongenerative
PDF Code
survey arXiv Mar 8, 2026 · 29d ago

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Xiaolei Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +5 more

Surveys LLM agent security threats across three autonomy tiers: cognitive manipulation, tool misuse, and multi-agent systemic failures

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 8, 2026 · 29d ago

EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification

Binjia Zhou, Dawei Luo, Shuai Chen et al. · Zhejiang University · Ant Group

Proposes VLM-based explainable deepfake detector with chain-of-thought reasoning and RL self-evolution for reliable forgery identification

Output Integrity Attack visionmultimodal
PDF
defense arXiv Mar 4, 2026 · 4w ago

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection

Yangyang Wei, Yijie Xu, Zhenyuan Li et al. · Zhejiang University · HOFSTRA University

Defends multi-agent LLM systems against indirect prompt injection by reconstructing cross-agent semantic flows for behavioral anomaly detection

Prompt Injection Excessive Agency nlp
PDF Code
defense arXiv Mar 2, 2026 · 5w ago

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

Xiaoyi Pang, Xuanyi Hao, Pengyu Liu et al. · arXiv · The Hong Kong University of Science and Technology +1 more

Detects backdoor and prompt injection attacks in black-box LLMs by monitoring token entropy lulls during generation

Model Poisoning Prompt Injection nlp
PDF Code
defense arXiv Feb 23, 2026 · 6w ago

A Secure and Private Distributed Bayesian Federated Learning Design

Nuocheng Yang, Sihua Wang, Zhaohui Yang et al. · Beijing University of Posts and Telecommunications · Zhejiang University +2 more

Defends distributed federated learning against Byzantine poisoning and gradient-based data reconstruction via GNN-RL neighbor selection

Data Poisoning Attack Model Inversion Attack federated-learning
PDF
defense arXiv Feb 21, 2026 · 6w ago

Watermarking LLM Agent Trajectories

Wenlong Meng, Chen Gong, Terry Yue Zhuo et al. · Zhejiang University · University of Virginia +2 more

Watermarks LLM agent training trajectories so models trained on stolen datasets emit detectable hook behaviors under a secret key

Output Integrity Attack nlpreinforcement-learning
PDF Code
defense arXiv Feb 21, 2026 · 6w ago

Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs

Chengwei Xia, Fan Ma, Ruijie Quan et al. · Lanzhou University · arXiv +2 more

Adversarially-optimized trigger images that verify MLLM copyright by eliciting ownership text only in fine-tuned derivatives

Model Theft Model Theft multimodalnlp
PDF
benchmark arXiv Feb 18, 2026 · 6w ago

The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models

Jakub Breier, Štefan Kučerák, Xiaolu Hou · TTControl GmbH · Slovak University of Technology +1 more

Benchmarks how FP32/FP16/INT8/INT4 weight formats affect ResNet and VGG resilience to physical EMFI attacks at inference time

Input Manipulation Attack vision
PDF
defense arXiv Feb 15, 2026 · 7w ago

Online LLM watermark detection via e-processes

Weijie Su, Ruodu Wang, Zinan Zhao · University of Pennsylvania · University of Waterloo +1 more

Proposes anytime-valid e-process framework for sequential LLM watermark detection with theoretical power guarantees

Output Integrity Attack nlp
PDF
benchmark arXiv Feb 14, 2026 · 7w ago

DWBench: Holistic Evaluation of Watermark for Dataset Copyright Auditing

Xiao Ren, Xinyi Yu, Linkang Du et al. · Zhejiang University · Xi'an Jiaotong University +1 more

Benchmarks 25 dataset watermarking methods for copyright auditing across classification and generation tasks with new evaluation metrics

Output Integrity Attack vision
PDF
defense arXiv Feb 13, 2026 · 7w ago

TCRL: Temporal-Coupled Adversarial Training for Robust Constrained Reinforcement Learning in Worst-Case Scenarios

Wentao Xu, Zhongming Yao, Weihao Li et al. · Northeastern University · Zhejiang University +1 more

Defends constrained RL agents against temporally coupled adversarial observation attacks via novel cost constraints and dual reward defense

Input Manipulation Attack reinforcement-learning
PDF
Loading more papers…