Latest papers

14 papers
benchmark arXiv Apr 1, 2026 · 5d ago

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan et al. · George Mason University · Tulane University +2 more

Benchmark of 120 prompt injection attacks on personal AI agents across skill files, emails, and web content

Prompt Injection Excessive Agency nlpmultimodal
PDF
defense arXiv Feb 9, 2026 · 8w ago

Verifying DNN-based Semantic Communication Against Generative Adversarial Noise

Thanh Le, Hai Duong, ThanhVu Nguyen et al. · National Institute of Information and Communications Technology · George Mason University

Formal verification framework certifies DNN robustness against adversarial noise in semantic communication via multi-network MIP formulation

Input Manipulation Attack vision
PDF
defense arXiv Jan 29, 2026 · 9w ago

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu · George Mason University

Distillation-based LLM unlearning embeds refusal into model parameters to resist reverse-prompt attacks that recover forgotten sensitive knowledge

Sensitive Information Disclosure Prompt Injection nlp
PDF Code
defense arXiv Jan 29, 2026 · 9w ago

MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models

Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar et al. · George Mason University

Distortion-free multi-bit LLM output watermark achieving 8-12% higher bit accuracy than prior methods with no text quality degradation

Output Integrity Attack nlp
PDF
survey SSRN Jan 26, 2026 · 10w ago

NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

Dhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das · George Mason University

Surveys privacy risks in social media NLP, evaluating MIA and attribute inference attacks across sentiment, dialect, and emotion models

Membership Inference Attack Model Inversion Attack nlp
PDF
defense arXiv Jan 22, 2026 · 10w ago

CodeGuard: Improving LLM Guardrails in CS Education

Nishat Raihan, Noah Erdachew, Jayoti Devi et al. · George Mason University · University of Oklahoma +1 more

Defends educational LLM coding assistants from unsafe prompts via PromptShield, a fine-tuned guardrail achieving 0.93 F1

Prompt Injection nlp
PDF Code
attack arXiv Dec 15, 2025 · Dec 2025

PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility

Md Nahid Hasan Shuvo, Moinul Hossain · George Mason University

Physical adversarial attack using anamorphic art tricks CAV object detectors with 90%+ success across four architectures

Input Manipulation Attack vision
PDF
attack arXiv Dec 11, 2025 · Dec 2025

FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning

Md Nahid Hasan Shuvo, Moinul Hossain, Anik Mallik et al. · George Mason University · Towson University +1 more

Side-channel attack infers FL client model architecture from encrypted Wi-Fi traffic with 98% F1-score

Model Theft federated-learning
PDF
attack arXiv Oct 30, 2025 · Oct 2025

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

Thanh Le, Hai Duong, Yusheng Ji et al. · The Graduate University for Advanced Studies · National Institute of Informatics +2 more

Grey-box attack on DRL-based 5G schedulers uses polytope abstract domains to craft adversarial CSI inputs degrading victim throughput by 70%

Input Manipulation Attack reinforcement-learning
1 citations PDF
attack arXiv Oct 24, 2025 · Oct 2025

NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

Hanyu Zhu, Lance Fiondella, Jiawei Yuan et al. · University of Massachusetts Dartmouth · George Mason University

Neuron-guided genetic attack injects adversarial passages into RAG knowledge bases to override LLM internal memory with 90%+ success

Input Manipulation Attack Prompt Injection nlp
1 citations PDF
defense arXiv Oct 12, 2025 · Oct 2025

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection

Gaojian Wang, Feng Lin, Tong Wu et al. · Zhejiang University · George Mason University

Self-supervised vision foundation model with novel masking objectives for generalizable deepfake, diffusion face, and anti-spoofing detection

Output Integrity Attack vision
PDF Code
benchmark arXiv Sep 12, 2025 · Sep 2025

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Changjia Zhu, Junjie Xiong, Renkai Ma et al. · University of South Florida · Missouri University of Science and Technology +2 more

Evaluates LLM peer reviewer bias and susceptibility to indirect prompt injection via covert instructions embedded in academic paper PDFs

Prompt Injection nlp
PDF
attack arXiv Sep 11, 2025 · Sep 2025

The Coding Limits of Robust Watermarking for Generative Models

Danilo Francati, Yevin Nikhel Goonatilake, Shubham Pawar et al. · Sapienza University of Rome · George Mason University +1 more

Proves binary watermarks for generative models break above 50% bit corruption and demonstrates crop-resize defeats real image watermarking

Output Integrity Attack generativevision
PDF
attack arXiv Aug 8, 2025 · Aug 2025

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation

Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese et al. · RSAC Labs · George Mason University

Attacks LLM-based IT operations agents via adversarial telemetry injection, then proposes sanitization-based defenses against it

Prompt Injection nlp
PDF Code