ML Security Papers

Latest papers

136 papers

defense arXiv Apr 28, 2026 · 23d ago

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · University of Science and Technology of China +2 more

Lightweight detector that identifies prompt injection attacks in web agent screenshots using visual gradient analysis and text recovery

Prompt Injection Excessive Agency multimodalnlp

PDF

defense arXiv Apr 24, 2026 · 27d ago

Train in Vain: Functionality-Preserving Poisoning to Prevent Unauthorized Use of Code Datasets

Yuan Xiao, Jiaming Wang, Yuchen Chen et al. · Nanjing University · University of New South Wales +3 more

Dataset poisoning defense that injects compilable, functionality-preserving code fragments to degrade CodeLLM training with only 10% contamination

Data Poisoning Attack Training Data Poisoning nlp

PDF

attack arXiv Apr 24, 2026 · 27d ago

Sharpness-Aware Poisoning: Enhancing Transferability of Injective Attacks on Recommender Systems

Junsong Xie, Yonghui Yang, Pengyang Shao et al. · Hefei University of Technology · National University of Singapore

Data poisoning attack on recommender systems using sharpness-aware optimization to boost transferability across victim models

Data Poisoning Attack

PDF

attack arXiv Apr 16, 2026 · 5w ago

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +2 more

Adversarial audio injection attack hijacking audio-language models via imperceptible audio perturbations that generalize across contexts

Input Manipulation Attack Prompt Injection audiomultimodalnlp

PDF

attack arXiv Apr 14, 2026 · 5w ago

CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

Qi Li, Cheng-Long Wang, Yinzhi Cao et al. · King Abdullah University of Science and Technology · National University of Singapore +1 more

Membership inference attacks on subset-trained models revealing both training membership and selection participation across data pipelines

Membership Inference Attack visionnlp

PDF

defense arXiv Apr 14, 2026 · 5w ago

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

Yulin Chen, Tri Cao, Haoran Li et al. · National University of Singapore · HKUST

Reasoning-driven multimodal guard model that detects prompt injection attacks in VLM-based web agents via parallel execution

Prompt Injection multimodalnlp

PDF

defense arXiv Apr 10, 2026 · 5w ago

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

Enyi Shi, Fei Shen, Shuyi Miao et al. · Nanjing University of Science and Technology · National University of Singapore +2 more

Neuron-level defense identifying and fine-tuning safety-critical neurons to improve VLLM robustness against cross-lingual multimodal jailbreaks

Input Manipulation Attack Prompt Injection multimodalnlpvision

PDF

defense arXiv Apr 9, 2026 · 6w ago

Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities

Jingtong Dou, Chuancheng Shi, Jian Wang et al. · The University of Sydney · Nanjing University of Posts and Telecommunications +1 more

Deepfake detector that extracts cross-modal forgery features to generalize across unseen modalities including isolated signals

Output Integrity Attack visionaudiomultimodal

PDF

attack arXiv Apr 8, 2026 · 6w ago

CAAP: Capture-Aware Adversarial Patch Attacks on Palmprint Recognition Models

Renyang Liu, Jiale Li, Jie Zhang et al. · National University of Singapore · A*STAR +3 more

Physical adversarial patch attack on palmprint recognition using cross-shaped patches that survive real-world capture distortions

Input Manipulation Attack vision

PDF Code

attack arXiv Apr 6, 2026 · 6w ago

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang, Haoqin Tu, Letian Zhang et al. · UC Santa Cruz · National University of Singapore +4 more

Real-world evaluation showing poisoning of agent persistent state (skills, config, memory) increases attack success from 25% to 64-74% across four LLM backbones

Prompt Injection Excessive Agency nlp

PDF Code

defense arXiv Apr 4, 2026 · 6w ago

SecureAFL: Secure Asynchronous Federated Learning

Anjun Gao, Feng Wang, Zhenglin Wan et al. · University of Louisville · Northeastern University +2 more

Byzantine-robust defense for asynchronous federated learning that detects poisoned updates and uses coordinate-wise median aggregation

Data Poisoning Attack federated-learning

PDF

attack arXiv Apr 3, 2026 · 6w ago

A Unified Perspective on Adversarial Membership Manipulation in Vision Models

Ruize Gao, Kaiwen Zhou, Yongqiang Chen et al. · National University of Singapore · Knowin AI +2 more

Adversarial perturbations fool membership inference attacks by fabricating fake members; proposes gradient-based detection and robust inference defenses

Membership Inference Attack Input Manipulation Attack vision

PDF

attack arXiv Apr 1, 2026 · 7w ago

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

Ruhao Liu, Weiqi Huang, Qi Li et al. · National University of Singapore

Agentic framework that automates membership inference attacks through self-exploration and strategy evolution, outperforming handcrafted baselines

Membership Inference Attack

PDF Code

attack arXiv Mar 27, 2026 · 7w ago

R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting

Tianrui Lou, Siyuan Liang, Jiawei Liang et al. · Sun Yat-Sen University · National University of Singapore

Physical adversarial camouflage attack on autonomous vehicles using relightable 3D Gaussian splatting for robustness across lighting and viewing angles

Input Manipulation Attack vision

PDF

attack arXiv Mar 27, 2026 · 7w ago

PEANUT: Perturbations by Eigenvalue Alignment for Attacking GNNs Under Topology-Driven Message Passing

Bhavya Kohli, Biplab Sikdar · National University of Singapore

Black-box node injection attack on GNNs exploiting topology-driven message passing via eigenvalue alignment without requiring node features

Input Manipulation Attack graph

PDF

defense arXiv Mar 23, 2026 · 8w ago

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

Xingyu Zhu, Beier Zhu, Shuo Wang et al. · University of Science and Technology of China · National University of Singapore +1 more

Null-space projection defense that blocks VLM jailbreaks while preserving benign performance through theoretically-grounded activation steering

Input Manipulation Attack Prompt Injection multimodalvisionnlp

PDF

tool arXiv Mar 19, 2026 · 9w ago

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

Haochen Zhao, Shaoyang Cui · National University of Singapore · Tsinghua University

MITM-based red-teaming framework that tests autonomous web agent security through real-time network traffic manipulation attacks

Prompt Injection Excessive Agency nlp

PDF Code

tool arXiv Mar 19, 2026 · 9w ago

MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning

Zhihui Chen, Kai He, Qingyuan Lei et al. · National University of Singapore · The Chinese University of Hong Kong +3 more

Detects medical image deepfakes via localize-then-analyze reasoning with expert-aligned explanations on synthetic lesion edits

Output Integrity Attack visionmultimodal

PDF Code

defense arXiv Mar 18, 2026 · 9w ago

Proof-of-Authorship for Diffusion-based AI Generated Content

De Zhang Lee, Han Fang, Ee-Chien Chang · National University of Singapore

Cryptographic proof-of-authorship for diffusion-generated images by binding generation seeds to author identity using pseudorandom functions

Output Integrity Attack visiongenerative

PDF

tool arXiv Mar 18, 2026 · 9w ago

VeriGrey: Greybox Agent Validation

Yuntong Zhang, Sungmin Kang, Ruijie Meng et al. · National University of Singapore · Max-Planck Institute of Security and Privacy

Greybox fuzzing framework that discovers indirect prompt injection vulnerabilities in LLM agents by mutating prompts and tracking tool invocations

Prompt Injection Excessive Agency Red-Team Agents Fuzzing & Test Generation nlp

PDF

Loading more papers…

Latest papers

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Train in Vain: Functionality-Preserving Poisoning to Prevent Unauthorized Use of Code Datasets

Sharpness-Aware Poisoning: Enhancing Transferability of Injective Attacks on Recommender Systems

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities

CAAP: Capture-Aware Adversarial Patch Attacks on Palmprint Recognition Models

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

SecureAFL: Secure Asynchronous Federated Learning

A Unified Perspective on Adversarial Membership Manipulation in Vision Models

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting

PEANUT: Perturbations by Eigenvalue Alignment for Attacking GNNs Under Topology-Driven Message Passing

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning

Proof-of-Authorship for Diffusion-based AI Generated Content

VeriGrey: Greybox Agent Validation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue