Latest papers

16 papers
defense arXiv Apr 1, 2026 · 5d ago

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng et al. · The Pennsylvania State University

Rule-based prompt injection detector using causal attribution to identify malicious context segments in long-context LLM agents

Prompt Injection Excessive Agency nlp
PDF Code
attack arXiv Mar 19, 2026 · 18d ago

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Jiahao Zhang, Yilong Wang, Suhang Wang · The Pennsylvania State University

Adversarial attack exploiting graph unlearning by injecting nodes designed to corrupt GNN performance when deletion is requested

Model Skewing Data Poisoning Attack graph
PDF
attack arXiv Mar 13, 2026 · 24d ago

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Chenlong Yin, Runpeng Geng, Yanting Wang et al. · The Pennsylvania State University

RL-based adaptive prompt injection attack that systematically breaks state-of-the-art LLM defenses using entropy regularization and advantage weighting

Prompt Injection nlp
PDF Code
attack arXiv Feb 6, 2026 · 8w ago

Extended to Reality: Prompt Injection in 3D Environments

Zhuoheng Li, Ying Chen · The Pennsylvania State University

Physical-world prompt injection attack places text-bearing 3D objects to hijack MLLM outputs across diverse camera trajectories

Input Manipulation Attack Prompt Injection visionmultimodal
PDF Code
attack arXiv Feb 2, 2026 · 9w ago

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Bohan Wang, Zewen Liu, Lu Lin et al. · Emory University · The Pennsylvania State University +2 more

Adversarially decouples time series classifier predictions from explanations, enabling targeted misclassification with plausible-looking cover-up explanations

Input Manipulation Attack timeseries
PDF
defense arXiv Jan 31, 2026 · 9w ago

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models

Jiaxi Yang, Shicheng Liu, Yuchen Yang et al. · The Pennsylvania State University

Proposes activation steering-based configurable refusal for VLMs that adaptively balances under- and over-refusal

Prompt Injection visionnlpmultimodal
PDF
benchmark arXiv Jan 25, 2026 · 10w ago

A Systemic Evaluation of Multimodal RAG Privacy

Ali Al-Lawati, Suhang Wang · The Pennsylvania State University

Empirically evaluates MIA and caption extraction attacks against private multimodal RAG databases via black-box prompt crafting

Membership Inference Attack Sensitive Information Disclosure visionnlpmultimodal
PDF Code
defense arXiv Nov 13, 2025 · Nov 2025

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Runpeng Geng, Yanting Wang, Chenlong Yin et al. · The Pennsylvania State University

Defends long-context LLMs against prompt injection by sanitizing high-attention tokens that drive injected instruction-following behavior

Prompt Injection nlp
3 citations 1 influentialPDF Code
attack EMNLP Nov 5, 2025 · Nov 2025

From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

Najrin Sultana, Md Rafi Ur Rashid, Kang Gu et al. · The Pennsylvania State University · Dartmouth College

LLM-driven adversarial text generation that fools LLM classifiers via semantic-preserving perturbations without gradient access

Prompt Injection nlp
PDF Code
defense arXiv Oct 14, 2025 · Oct 2025

PromptLocate: Localizing Prompt Injection Attacks

Yuqi Jia, Yupei Liu, Zedian Shao et al. · Duke University · The Pennsylvania State University

First prompt injection localization method for LLMs, pinpointing injected instructions and data for post-attack forensics

Prompt Injection nlp
8 citations 1 influentialPDF
defense CCS Sep 26, 2025 · Sep 2025

You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

Bochuan Cao, Changjiang Li, Yuanpu Cao et al. · The Pennsylvania State University · Palo Alto Networks +1 more

Attacks GPT-4o/Claude to extract system prompts, then defends with SysVec encoding prompts as hidden internal vectors

Sensitive Information Disclosure nlp
5 citations 1 influentialPDF
tool EMNLP Sep 24, 2025 · Sep 2025

Unmasking Fake Careers: Detecting Machine-Generated Career Trajectories via Multi-layer Heterogeneous Graphs

Michiharu Yamashita, Thanh Tran, Delvin Ce Zhang et al. · The Pennsylvania State University · Amazon +1 more

Novel graph-based detection system for LLM-generated fake resume trajectories, outperforming text-based detectors by up to 85%

Output Integrity Attack nlpgraph
3 citations PDF Code
attack arXiv Sep 15, 2025 · Sep 2025

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Yifan Lan, Yuanpu Cao, Weitong Zhang et al. · The Pennsylvania State University · The University of North Carolina at Chapel Hill

Gradient-optimized adversarial images hijack MLLM output preferences at inference time with transferable universal perturbations

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code
attack arXiv Aug 24, 2025 · Aug 2025

Exposing Privacy Risks in Graph Retrieval-Augmented Generation

Jiale Liu, Jiahao Zhang, Suhang Wang · The Pennsylvania State University

Extracts private entities and relationships from Graph RAG systems, revealing worse structured data leakage than standard RAG

Sensitive Information Disclosure nlpgraph
PDF
defense arXiv Aug 16, 2025 · Aug 2025

Optimizing Token Choice for Code Watermarking: An RL Approach

Zhimeng Guo, Huaisheng Zhu, Siyuan Xu et al. · The Pennsylvania State University

RL-trained policy biases token choices during LLM code generation to embed detectable IP-protection watermarks without breaking functionality

Output Integrity Attack nlp
PDF
defense arXiv Jan 7, 2025 · Jan 2025

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

Yupei Liu, Yanting Wang, Jinyuan Jia · The Pennsylvania State University

Data-free defense that detects and removes trojan triggers from test inputs in self-supervised learning encoders

Model Poisoning vision
PDF