Latest papers

22 papers
defense arXiv Mar 12, 2026 · 25d ago

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai et al. · Shanghai Jiao Tong University · Ant Group +2 more

Embeds per-client backdoor watermarks in federated LMs to trace model leaks to individual culprits via black-box queries

Model Theft Model Poisoning nlpfederated-learningmultimodal
PDF
survey arXiv Mar 8, 2026 · 29d ago

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Xiaolei Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +5 more

Surveys LLM agent security threats across three autonomy tiers: cognitive manipulation, tool misuse, and multi-agent systemic failures

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Mar 2, 2026 · 5w ago

Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection

Jianfeng Liao, Yichen Wei, Raymond Chan Ching Bon et al. · Shenzhen Technology University · Singapore Institute of Technology +2 more

Proposes CLIP-based dual-stream deepfake detector combining global adapters and local facial anomaly streams for improved generalization

Output Integrity Attack vision
PDF Code
defense arXiv Feb 23, 2026 · 6w ago

A Secure and Private Distributed Bayesian Federated Learning Design

Nuocheng Yang, Sihua Wang, Zhaohui Yang et al. · Beijing University of Posts and Telecommunications · Zhejiang University +2 more

Defends distributed federated learning against Byzantine poisoning and gradient-based data reconstruction via GNN-RL neighbor selection

Data Poisoning Attack Model Inversion Attack federated-learning
PDF
attack arXiv Feb 6, 2026 · 8w ago

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu et al. · The University of Hong Kong · Nanjing University +1 more

Fine-tunes an LLM as a poison generator to inject robust, chunking-aware malicious content into RAG knowledge bases

Data Poisoning Attack Prompt Injection nlp
PDF
attack arXiv Dec 22, 2025 · Dec 2025

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

Jihui Guo, Zongmin Zhang, Zhen Sun et al. · The University of Hong Kong · The Hong Kong University of Science and Technology +2 more

Backdoor attack on 6DoF pose estimation using 3D object triggers to induce controlled erroneous rotations and translations with 100% ASR

Model Poisoning vision
1 citations PDF Code
tool arXiv Dec 22, 2025 · Dec 2025

DREAM: Dynamic Red-teaming across Environments for AI Models

Liming Lu, Xiang Gu, Junyu Huang et al. · Nanjing University of Science and Technology · The University of Hong Kong +3 more

Automated red-teaming tool for LLM agents that chains 1,986 atomic attacks across 349 environments, achieving 70%+ bypass rates

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Dec 11, 2025 · Dec 2025

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

Han Yang, Shaofeng Li, Tian Dong et al. · Southeast University · The University of Hong Kong

Embeds hardware-anchored backdoors in DNNs as active access control, making stolen models useless without an authorized trigger

Model Theft Model Poisoning vision
PDF Code
defense arXiv Nov 26, 2025 · Nov 2025

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

Yuxiao Xiang, Junchi Chen, Zhenchao Jin et al. · University of Science and Technology of China · Anhui Province Key Laboratory of Digital Security +1 more

Defends VLMs against unsafe intermediate reasoning by auditing the full Question-Thinking-Answer pipeline with a vision-aware safety guard

Prompt Injection multimodalnlp
PDF
defense arXiv Nov 10, 2025 · Nov 2025

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Zhisheng Zhang, Derui Wang, Yifan Mi et al. · Tsinghua University · Beijing University of Posts and Telecommunications +4 more

Proactive adversarial audio perturbations disrupt LLM-based voice cloning by targeting speaker encoders and ASR transcription simultaneously

Input Manipulation Attack Output Integrity Attack audionlp
PDF Code
benchmark arXiv Oct 23, 2025 · Oct 2025

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

Chiyu Chen, Xinhao Song, Yunkai Chai et al. · Shanghai Jiao Tong University · Shanghai Artificial Intelligence Laboratory +1 more

Benchmark evaluating VLM mobile agents against environmental injection attacks via adversarial UI overlays and spoofed notifications in Android emulators

Prompt Injection Excessive Agency multimodalvision
3 citations PDF Code
defense arXiv Oct 15, 2025 · Oct 2025

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

Yisen Wang, Yichuan Mo, Hongjun Wang et al. · Peking University · The University of Hong Kong

Meta-learning adversarial training framework that resolves natural-robust and multi-norm robustness trade-offs via specialized base learners

Input Manipulation Attack vision
2 citations PDF
defense arXiv Oct 5, 2025 · Oct 2025

COSMO-RL: Towards Trustworthy LMRMs via Joint Safety and Stability

Yizhuo Ding, Mingkang Chen, Qiuhua Liu et al. · Fudan University · Shanghai AI Laboratory +3 more

Defends large multimodal reasoning models against jailbreaks via multi-objective RL that jointly optimizes safety and reasoning capability

Prompt Injection multimodalnlpvisionreinforcement-learning
PDF
attack arXiv Oct 2, 2025 · Oct 2025

Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems

Junjie Su, Weifei Jin, Yuxin Cao et al. · Beijing University of Posts and Telecommunications · National University of Singapore +2 more

First targeted adversarial attack framework for polyphonic SED, inserting or deleting sound events with precise region control via preservation loss

Input Manipulation Attack audio
PDF
benchmark arXiv Sep 9, 2025 · Sep 2025

How Far Are We from True Unlearnability?

Kai Ye, Liangcai Su, Chenxiong Qian · The University of Hong Kong

Benchmarks unlearnable example defenses, revealing cross-task failures and proposing Sharpness-Aware Learnability metrics to quantify data unlearnability

Data Poisoning Attack vision
PDF
attack arXiv Sep 9, 2025 · Sep 2025

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Kai Ye, Liangcai Su, Chenxiong Qian · The University of Hong Kong

Poisons RAG documentation corpora to hijack LLM code generation into recommending malicious software packages via embedded jailbreaks

Input Manipulation Attack Prompt Injection nlp
PDF Code
defense arXiv Sep 1, 2025 · Sep 2025

LiFeChain: Lightweight Blockchain for Secure and Efficient Federated Lifelong Learning in IoT

Handi Chen, Jing Deng, Xiuzhe Wu et al. · The University of Hong Kong

Blockchain-based consensus protocol defends federated lifelong learning against persistent poisoning attacks from malicious IoT clients

Data Poisoning Attack federated-learning
PDF
attack arXiv Aug 8, 2025 · Aug 2025

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Haorui He, Yupeng Li, Bin Benjamin Zhu et al. · Hong Kong Baptist University · The University of Hong Kong +1 more

Poisons RAG knowledge bases of LLM fact-checkers by mimicking claim decomposition and exploiting justifications to craft targeted malicious evidence

Data Poisoning Attack Prompt Injection nlp
PDF Code
defense arXiv Aug 7, 2025 · Aug 2025

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

Haoyu Liu, Chaoyu Gong, Mengke He et al. · Nanyang Technological University · University of Southern California +1 more

Lightweight GNN framework unifying spatial, spectral, and temporal cues for cross-domain deepfake video detection

Output Integrity Attack vision
PDF Code
defense arXiv Aug 4, 2025 · Aug 2025

Protego: User-Centric Pose-Invariant Privacy Protection Against Face Recognition-Induced Digital Footprint Exposure

Ziling Wang, Shuya Yang, Jialin Lu et al. · The University of Hong Kong

Defends facial images from FR-based retrieval systems via pose-invariant adversarial perturbations that prevent matching even among protected images

Input Manipulation Attack vision
PDF
Loading more papers…