ML Security Papers

Latest papers

19 papers

tool arXiv Apr 30, 2026 · 21d ago

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Yanting Wang, Chenlong Yin, Ying Chen et al. · The Pennsylvania State University

Efficient red-teaming framework achieving 2-7x speedup for optimization-based prompt injection and knowledge corruption attacks on long-context LLMs

Prompt Injection Red-Team Agents Benchmarks & Evaluation nlp

PDF Code

benchmark arXiv Apr 9, 2026 · 6w ago

PIArena: A Platform for Prompt Injection Evaluation

Runpeng Geng, Chenlong Yin, Yanting Wang et al. · The Pennsylvania State University

Unified benchmark platform for evaluating prompt injection attacks and defenses across diverse datasets with adaptive strategy-based attacks

Prompt Injection nlp

PDF Code

defense arXiv Apr 8, 2026 · 6w ago

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia et al. · The Pennsylvania State University

Prevents LLM tool poisoning by auto-generating trusted tool descriptions from source code via static analysis and dynamic verification

Prompt Injection Insecure Plugin Design nlp

PDF

defense arXiv Apr 1, 2026 · 7w ago

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng et al. · The Pennsylvania State University

Rule-based prompt injection detector using causal attribution to identify malicious context segments in long-context LLM agents

Prompt Injection Excessive Agency nlp

PDF Code

attack arXiv Mar 19, 2026 · 9w ago

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Jiahao Zhang, Yilong Wang, Suhang Wang · The Pennsylvania State University

Adversarial attack exploiting graph unlearning by injecting nodes designed to corrupt GNN performance when deletion is requested

Model Skewing Data Poisoning Attack graph

PDF

attack arXiv Mar 13, 2026 · 9w ago

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Chenlong Yin, Runpeng Geng, Yanting Wang et al. · The Pennsylvania State University

RL-based adaptive prompt injection attack that systematically breaks state-of-the-art LLM defenses using entropy regularization and advantage weighting

Prompt Injection Red-Team Agents nlp

PDF Code

attack arXiv Feb 6, 2026 · Feb 2026

Extended to Reality: Prompt Injection in 3D Environments

Zhuoheng Li, Ying Chen · The Pennsylvania State University

Physical-world prompt injection attack places text-bearing 3D objects to hijack MLLM outputs across diverse camera trajectories

Input Manipulation Attack Prompt Injection visionmultimodal

PDF Code

attack arXiv Feb 2, 2026 · Feb 2026

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Bohan Wang, Zewen Liu, Lu Lin et al. · Emory University · The Pennsylvania State University +2 more

Adversarially decouples time series classifier predictions from explanations, enabling targeted misclassification with plausible-looking cover-up explanations

Input Manipulation Attack timeseries

PDF

defense arXiv Jan 31, 2026 · Jan 2026

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models

Jiaxi Yang, Shicheng Liu, Yuchen Yang et al. · The Pennsylvania State University

Proposes activation steering-based configurable refusal for VLMs that adaptively balances under- and over-refusal

Prompt Injection visionnlpmultimodal

PDF

benchmark arXiv Jan 25, 2026 · Jan 2026

A Systemic Evaluation of Multimodal RAG Privacy

Ali Al-Lawati, Suhang Wang · The Pennsylvania State University

Empirically evaluates MIA and caption extraction attacks against private multimodal RAG databases via black-box prompt crafting

Membership Inference Attack Sensitive Information Disclosure visionnlpmultimodal

PDF Code

defense arXiv Nov 13, 2025 · Nov 2025

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Runpeng Geng, Yanting Wang, Chenlong Yin et al. · The Pennsylvania State University

Defends long-context LLMs against prompt injection by sanitizing high-attention tokens that drive injected instruction-following behavior

Prompt Injection nlp

3 citations 1 influentialPDF Code

attack EMNLP Nov 5, 2025 · Nov 2025

From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

Najrin Sultana, Md Rafi Ur Rashid, Kang Gu et al. · The Pennsylvania State University · Dartmouth College

LLM-driven adversarial text generation that fools LLM classifiers via semantic-preserving perturbations without gradient access

Prompt Injection nlp

PDF Code

defense arXiv Oct 14, 2025 · Oct 2025

PromptLocate: Localizing Prompt Injection Attacks

Yuqi Jia, Yupei Liu, Zedian Shao et al. · Duke University · The Pennsylvania State University

First prompt injection localization method for LLMs, pinpointing injected instructions and data for post-attack forensics

Prompt Injection nlp

8 citations 1 influentialPDF

defense CCS Sep 26, 2025 · Sep 2025

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Bochuan Cao, Changjiang Li, Yuanpu Cao et al. · The Pennsylvania State University · Palo Alto Networks +1 more

Attacks GPT-4o/Claude to extract system prompts, then defends with SysVec encoding prompts as hidden internal vectors

Sensitive Information Disclosure nlp

5 citations 1 influentialPDF

tool EMNLP Sep 24, 2025 · Sep 2025

Unmasking Fake Careers: Detecting Machine-Generated Career Trajectories via Multi-layer Heterogeneous Graphs

Michiharu Yamashita, Thanh Tran, Delvin Ce Zhang et al. · The Pennsylvania State University · Amazon +1 more

Novel graph-based detection system for LLM-generated fake resume trajectories, outperforming text-based detectors by up to 85%

Output Integrity Attack nlpgraph

3 citations PDF Code

attack arXiv Sep 15, 2025 · Sep 2025

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Yifan Lan, Yuanpu Cao, Weitong Zhang et al. · The Pennsylvania State University · The University of North Carolina at Chapel Hill

Gradient-optimized adversarial images hijack MLLM output preferences at inference time with transferable universal perturbations

Input Manipulation Attack Prompt Injection visionnlpmultimodal

PDF Code

attack arXiv Aug 24, 2025 · Aug 2025

Exposing Privacy Risks in Graph Retrieval-Augmented Generation

Jiale Liu, Jiahao Zhang, Suhang Wang · The Pennsylvania State University

Extracts private entities and relationships from Graph RAG systems, revealing worse structured data leakage than standard RAG

Sensitive Information Disclosure nlpgraph

PDF

defense arXiv Aug 16, 2025 · Aug 2025

Optimizing Token Choice for Code Watermarking: An RL Approach

Zhimeng Guo, Huaisheng Zhu, Siyuan Xu et al. · The Pennsylvania State University

RL-trained policy biases token choices during LLM code generation to embed detectable IP-protection watermarks without breaking functionality

Output Integrity Attack nlp

PDF

defense arXiv Jan 7, 2025 · Jan 2025

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

Yupei Liu, Yanting Wang, Jinyuan Jia · The Pennsylvania State University

Data-free defense that detects and removes trojan triggers from test inputs in self-supervised learning encoders

Model Poisoning vision

PDF

Latest papers

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

PIArena: A Platform for Prompt Injection Evaluation

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

AgentWatcher: A Rule-based Prompt Injection Monitor

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Extended to Reality: Prompt Injection in 3D Environments

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models

A Systemic Evaluation of Multimodal RAG Privacy

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

PromptLocate: Localizing Prompt Injection Attacks

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Unmasking Fake Careers: Detecting Machine-Generated Career Trajectories via Multi-layer Heterogeneous Graphs

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Exposing Privacy Risks in Graph Retrieval-Augmented Generation

Optimizing Token Choice for Code Watermarking: An RL Approach

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue