ML Security Papers

Latest papers

22 papers

defense arXiv Mar 12, 2026 · 25d ago

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai et al. · Shanghai Jiao Tong University · Ant Group +2 more

Embeds per-client backdoor watermarks in federated LMs to trace model leaks to individual culprits via black-box queries

Model Theft Model Poisoning nlpfederated-learningmultimodal

PDF

survey arXiv Mar 8, 2026 · 29d ago

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Xiaolei Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +5 more

Surveys LLM agent security threats across three autonomy tiers: cognitive manipulation, tool misuse, and multi-agent systemic failures

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF

defense arXiv Mar 2, 2026 · 5w ago

Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection

Jianfeng Liao, Yichen Wei, Raymond Chan Ching Bon et al. · Shenzhen Technology University · Singapore Institute of Technology +2 more

Proposes CLIP-based dual-stream deepfake detector combining global adapters and local facial anomaly streams for improved generalization

Output Integrity Attack vision

PDF Code

defense arXiv Feb 23, 2026 · 6w ago

A Secure and Private Distributed Bayesian Federated Learning Design

Nuocheng Yang, Sihua Wang, Zhaohui Yang et al. · Beijing University of Posts and Telecommunications · Zhejiang University +2 more

Defends distributed federated learning against Byzantine poisoning and gradient-based data reconstruction via GNN-RL neighbor selection

Data Poisoning Attack Model Inversion Attack federated-learning

PDF

attack arXiv Feb 6, 2026 · 8w ago

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu et al. · The University of Hong Kong · Nanjing University +1 more

Fine-tunes an LLM as a poison generator to inject robust, chunking-aware malicious content into RAG knowledge bases

Data Poisoning Attack Prompt Injection nlp

PDF

attack arXiv Dec 22, 2025 · Dec 2025

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

Jihui Guo, Zongmin Zhang, Zhen Sun et al. · The University of Hong Kong · The Hong Kong University of Science and Technology +2 more

Backdoor attack on 6DoF pose estimation using 3D object triggers to induce controlled erroneous rotations and translations with 100% ASR

Model Poisoning vision

1 citations PDF Code

tool arXiv Dec 22, 2025 · Dec 2025

DREAM: Dynamic Red-teaming across Environments for AI Models

Liming Lu, Xiang Gu, Junyu Huang et al. · Nanjing University of Science and Technology · The University of Hong Kong +3 more

Automated red-teaming tool for LLM agents that chains 1,986 atomic attacks across 349 environments, achieving 70%+ bypass rates

Prompt Injection Excessive Agency nlp

PDF

defense arXiv Dec 11, 2025 · Dec 2025

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

Han Yang, Shaofeng Li, Tian Dong et al. · Southeast University · The University of Hong Kong

Embeds hardware-anchored backdoors in DNNs as active access control, making stolen models useless without an authorized trigger

Model Theft Model Poisoning vision

PDF Code

defense arXiv Nov 26, 2025 · Nov 2025

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

Yuxiao Xiang, Junchi Chen, Zhenchao Jin et al. · University of Science and Technology of China · Anhui Province Key Laboratory of Digital Security +1 more

Defends VLMs against unsafe intermediate reasoning by auditing the full Question-Thinking-Answer pipeline with a vision-aware safety guard

Prompt Injection multimodalnlp

PDF

defense arXiv Nov 10, 2025 · Nov 2025

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Zhisheng Zhang, Derui Wang, Yifan Mi et al. · Tsinghua University · Beijing University of Posts and Telecommunications +4 more

Proactive adversarial audio perturbations disrupt LLM-based voice cloning by targeting speaker encoders and ASR transcription simultaneously

Input Manipulation Attack Output Integrity Attack audionlp

PDF Code

benchmark arXiv Oct 23, 2025 · Oct 2025

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

Chiyu Chen, Xinhao Song, Yunkai Chai et al. · Shanghai Jiao Tong University · Shanghai Artificial Intelligence Laboratory +1 more

Benchmark evaluating VLM mobile agents against environmental injection attacks via adversarial UI overlays and spoofed notifications in Android emulators

Prompt Injection Excessive Agency multimodalvision

3 citations PDF Code

defense arXiv Oct 15, 2025 · Oct 2025

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

Yisen Wang, Yichuan Mo, Hongjun Wang et al. · Peking University · The University of Hong Kong

Meta-learning adversarial training framework that resolves natural-robust and multi-norm robustness trade-offs via specialized base learners

Input Manipulation Attack vision

2 citations PDF

defense arXiv Oct 5, 2025 · Oct 2025

COSMO-RL: Towards Trustworthy LMRMs via Joint Safety and Stability

Yizhuo Ding, Mingkang Chen, Qiuhua Liu et al. · Fudan University · Shanghai AI Laboratory +3 more

Defends large multimodal reasoning models against jailbreaks via multi-objective RL that jointly optimizes safety and reasoning capability

Prompt Injection multimodalnlpvisionreinforcement-learning

PDF

attack arXiv Oct 2, 2025 · Oct 2025

Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems

Junjie Su, Weifei Jin, Yuxin Cao et al. · Beijing University of Posts and Telecommunications · National University of Singapore +2 more

First targeted adversarial attack framework for polyphonic SED, inserting or deleting sound events with precise region control via preservation loss

Input Manipulation Attack audio

PDF

benchmark arXiv Sep 9, 2025 · Sep 2025

How Far Are We from True Unlearnability?

Kai Ye, Liangcai Su, Chenxiong Qian · The University of Hong Kong

Benchmarks unlearnable example defenses, revealing cross-task failures and proposing Sharpness-Aware Learnability metrics to quantify data unlearnability

Data Poisoning Attack vision

PDF

attack arXiv Sep 9, 2025 · Sep 2025

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Kai Ye, Liangcai Su, Chenxiong Qian · The University of Hong Kong

Poisons RAG documentation corpora to hijack LLM code generation into recommending malicious software packages via embedded jailbreaks

Input Manipulation Attack Prompt Injection nlp

PDF Code

defense arXiv Sep 1, 2025 · Sep 2025

LiFeChain: Lightweight Blockchain for Secure and Efficient Federated Lifelong Learning in IoT

Handi Chen, Jing Deng, Xiuzhe Wu et al. · The University of Hong Kong

Blockchain-based consensus protocol defends federated lifelong learning against persistent poisoning attacks from malicious IoT clients

Data Poisoning Attack federated-learning

PDF

attack arXiv Aug 8, 2025 · Aug 2025

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Haorui He, Yupeng Li, Bin Benjamin Zhu et al. · Hong Kong Baptist University · The University of Hong Kong +1 more

Poisons RAG knowledge bases of LLM fact-checkers by mimicking claim decomposition and exploiting justifications to craft targeted malicious evidence

Data Poisoning Attack Prompt Injection nlp

PDF Code

defense arXiv Aug 7, 2025 · Aug 2025

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

Haoyu Liu, Chaoyu Gong, Mengke He et al. · Nanyang Technological University · University of Southern California +1 more

Lightweight GNN framework unifying spatial, spectral, and temporal cues for cross-domain deepfake video detection

Output Integrity Attack vision

PDF Code

defense arXiv Aug 4, 2025 · Aug 2025

Protego: User-Centric Pose-Invariant Privacy Protection Against Face Recognition-Induced Digital Footprint Exposure

Ziling Wang, Shuya Yang, Jialin Lu et al. · The University of Hong Kong

Defends facial images from FR-based retrieval systems via pose-invariant adversarial perturbations that prevent matching even among protected images

Input Manipulation Attack vision

PDF

Loading more papers…

Latest papers

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection

A Secure and Private Distributed Bayesian Federated Learning Design

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

DREAM: Dynamic Red-teaming across Environments for AI Models

Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

COSMO-RL: Towards Trustworthy LMRMs via Joint Safety and Stability

Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems

How Far Are We from True Unlearnability?

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

LiFeChain: Lightweight Blockchain for Secure and Efficient Federated Lifelong Learning in IoT

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework

Protego: User-Centric Pose-Invariant Privacy Protection Against Face Recognition-Induced Digital Footprint Exposure

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue