ML Security Papers

Stats

Latest papers

36 papers

attack arXiv Mar 31, 2026 · 8d ago

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi · Purdue University

Backdoor attack on federated learning using semantic triggers like sunglasses that evade robust aggregation defenses

Model Poisoning Data Poisoning Attack visionfederated-learning

PDF

defense arXiv Mar 30, 2026 · 9d ago

Lipschitz verification of neural networks through training

Simon Kuang, Yuezhu Xu, S. Sivaranjani et al. · University of California · Purdue University

Trains certifiably robust neural networks by penalizing the trivial Lipschitz bound during training, achieving tight provable robustness guarantees

Input Manipulation Attack vision

PDF

tool arXiv Feb 9, 2026 · 8w ago

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi · Purdue University

Exposes PRNG implementation weaknesses in ML frameworks as covert attack vectors, defends with RNGGuard static+runtime enforcement tool

AI Supply Chain Attacks

PDF

survey arXiv Feb 6, 2026 · 8w ago

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al. · IARPA · NIST +13 more

Surveys IARPA TrojAI program findings on AI backdoor detection via weight analysis and trigger inversion across multi-year research

Model Poisoning visionnlp

PDF

attack arXiv Feb 6, 2026 · 8w ago

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

Joshua Ward, Chi-Hua Wang, Guang Cheng · University of California Los Angeles · Purdue University

Proposes MT-MIA, a graph-based membership inference attack exposing user-level privacy leakage in multi-table synthetic relational databases

Membership Inference Attack tabulargraph

PDF Code

attack arXiv Jan 27, 2026 · 10w ago

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen, Kaiyuan Zhang, Yuntao Du et al. · Purdue University · Cisco

Proposes SAMA, a membership inference attack exploiting mask aggregation to expose privacy vulnerabilities in diffusion language models

Membership Inference Attack nlp

PDF

benchmark arXiv Jan 23, 2026 · 10w ago

On the Effects of Adversarial Perturbations on Distribution Robustness

Yipei Wang, Zhaoying Pan, Xiaoqian Wang · Purdue University

Theoretical analysis showing ℓ∞ adversarial training can improve distribution robustness when data bias is moderate and feature separability is high

Input Manipulation Attack vision

PDF

attack arXiv Jan 6, 2026 · Jan 2026

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Yuetian Chen, Yuntao Du, Kaiyuan Zhang et al. · Purdue University · Cisco Research +1 more

Sliding-window MIA against fine-tuned LLMs captures localized memorization signals, achieving 2-3x better detection than global-loss baselines

Membership Inference Attack nlp

PDF

attack arXiv Dec 31, 2025 · Dec 2025

The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition

Xiaoze Liu, Weichen Yu, Matt Fredrikson et al. · Purdue University · Carnegie Mellon University

Engineers a stealthy breaker token that lies dormant in donor LLMs but activates as a trojan after tokenizer transplant into a base model

AI Supply Chain Attacks Model Poisoning nlp

1 citations PDF Code

attack Asia-Pacific Computer Systems ... Dec 1, 2025 · Dec 2025

Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory

Chenyi Wang, Yanmao Man, Raymond Muller et al. · University of Arizona · HERE Technologies +3 more

Physical adversarial trajectory attack that transfers tracked IDs between objects in MOT systems, bypassing object detection with 100% white-box success

Input Manipulation Attack vision

1 citations PDF

defense arXiv Nov 25, 2025 · Nov 2025

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley et al. · Purdue University · Perplexity AI

Benchmarks indirect prompt injection in AI browser agents and proposes multi-layered architectural and model-based defenses

Prompt Injection Excessive Agency nlp

7 citations PDF

defense arXiv Nov 24, 2025 · Nov 2025

Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation

Shristi Das Biswas, Arani Roy, Kaushik Roy · Purdue University

Defends T2I/T2V diffusion models against harmful content generation via instant training-free weight modification robust to red-teaming

Output Integrity Attack visiongenerative

1 citations PDF

tool arXiv Nov 16, 2025 · Nov 2025

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

Shail Desai, Aditya Pawar, Li Lin et al. · Purdue University · State University of New York

Deploys open multimodal deepfake detection platform combining traditional detectors with MLLM-based explainable forensic reasoning for images and audio

Output Integrity Attack visionaudiomultimodal

PDF Code

defense arXiv Nov 13, 2025 · Nov 2025

Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

Feng Ding, Wenhui Yi, Yunpeng Zhou et al. · NanChang University · Shenzhen University +1 more

Fairness-aware deepfake detector using channel decoupling and distribution alignment to reduce demographic bias without sacrificing accuracy

Output Integrity Attack vision

PDF

attack IEEE IoT-J Nov 10, 2025 · Nov 2025

Adversarial Node Placement in Decentralized Federated Learning: Maximum Spanning-Centrality Strategy and Performance Analysis

Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis et al. · Purdue University · Princeton University +1 more

Proposes MaxSpAN-FL, a hybrid topology-aware strategy for placing Byzantine nodes in decentralized FL to maximize model degradation

Data Poisoning Attack federated-learning

PDF

attack arXiv Oct 20, 2025 · Oct 2025

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

Qilin Liao, Anamika Lochab, Ruqi Zhang · Purdue University

Variational inference framework generates coupled adversarial text-image prompts to jailbreak VLMs, achieving 53.75% higher ASR than SOTA on GPT-4o

Input Manipulation Attack Prompt Injection multimodalvisionnlp

PDF

benchmark arXiv Oct 18, 2025 · Oct 2025

Fit for Purpose? Deepfake Detection in the Real World

Guangyu Lin, Li Lin, Christina P. Walker et al. · Purdue University

Benchmarks deepfake detectors on real-world political deepfakes, revealing poor generalization and vulnerability to simple manipulations

Output Integrity Attack visionmultimodal

2 citations PDF

defense arXiv Oct 13, 2025 · Oct 2025

The Easy Path to Robustness: Coreset Selection using Sample Hardness

Pranav Ramesh, Arjun Roy, Deepak Ravikumar et al. · Indian Institute of Technology Madras · Purdue University

Defends against adversarial examples by selecting 'easy' low-gradient-norm training samples via EasyCore coreset algorithm

Input Manipulation Attack vision

PDF

benchmark arXiv Oct 8, 2025 · Oct 2025

PEAR: Planner-Executor Agent Robustness Benchmark

Shen Dong, Mingxuan Zhang, Pengfei He et al. · Michigan State University · Purdue University +1 more

Benchmark for evaluating adversarial robustness of LLM planner-executor multi-agent systems across harmful action, privacy, and DoS attacks

Prompt Injection Excessive Agency nlp

PDF Code

attack arXiv Oct 7, 2025 · Oct 2025

Membership Inference Attacks on Tokenizers of Large Language Models

Meng Tong, Yuntao Du, Kejiang Chen et al. · University of Science and Technology of China · Purdue University

Exploits LLM tokenizers as a new membership inference attack vector, achieving AUC 0.771 against state-of-the-art LLM tokenizers

Membership Inference Attack nlp

PDF

Loading more papers…

Latest papers

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Lipschitz verification of neural networks through training

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Trojans in Artificial Intelligence (TrojAI) Final Report

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

On the Effects of Adversarial Perturbations on Distribution Robustness

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition

Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

Adversarial Node Placement in Decentralized Federated Learning: Maximum Spanning-Centrality Strategy and Performance Analysis

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

Fit for Purpose? Deepfake Detection in the Real World

The Easy Path to Robustness: Coreset Selection using Sample Hardness

PEAR: Planner-Executor Agent Robustness Benchmark

Membership Inference Attacks on Tokenizers of Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue