ML Security Papers

Latest papers

15 papers

defense arXiv Apr 16, 2026 · 5w ago

Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation

Yisheng Zhong, Sijia Liu, Zhuangdi Zhu · George Mason University · Michigan State University

Multi-objective LLM unlearning framework that removes hazardous knowledge while defending against adversarial probing attacks via bidirectional distillation

Model Inversion Attack Prompt Injection Sensitive Information Disclosure nlp

PDF

benchmark arXiv Apr 1, 2026 · 7w ago

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan et al. · George Mason University · Tulane University +2 more

Benchmark of 120 prompt injection attacks on personal AI agents across skill files, emails, and web content

Prompt Injection Excessive Agency nlpmultimodal

PDF

defense arXiv Feb 9, 2026 · Feb 2026

Verifying DNN-based Semantic Communication Against Generative Adversarial Noise

Thanh Le, Hai Duong, ThanhVu Nguyen et al. · National Institute of Information and Communications Technology · George Mason University

Formal verification framework certifies DNN robustness against adversarial noise in semantic communication via multi-network MIP formulation

Input Manipulation Attack vision

PDF

defense arXiv Jan 29, 2026 · Jan 2026

MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models

Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar et al. · George Mason University

Distortion-free multi-bit LLM output watermark achieving 8-12% higher bit accuracy than prior methods with no text quality degradation

Output Integrity Attack nlp

PDF

defense arXiv Jan 29, 2026 · Jan 2026

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu · George Mason University

Distillation-based LLM unlearning embeds refusal into model parameters to resist reverse-prompt attacks that recover forgotten sensitive knowledge

Sensitive Information Disclosure Prompt Injection nlp

PDF Code

survey SSRN Jan 26, 2026 · Jan 2026

NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

Dhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das · George Mason University

Surveys privacy risks in social media NLP, evaluating MIA and attribute inference attacks across sentiment, dialect, and emotion models

Membership Inference Attack Model Inversion Attack nlp

PDF

defense arXiv Jan 22, 2026 · Jan 2026

CodeGuard: Improving LLM Guardrails in CS Education

Nishat Raihan, Noah Erdachew, Jayoti Devi et al. · George Mason University · University of Oklahoma +1 more

Defends educational LLM coding assistants from unsafe prompts via PromptShield, a fine-tuned guardrail achieving 0.93 F1

Prompt Injection nlp

PDF Code

attack arXiv Dec 15, 2025 · Dec 2025

PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility

Md Nahid Hasan Shuvo, Moinul Hossain · George Mason University

Physical adversarial attack using anamorphic art tricks CAV object detectors with 90%+ success across four architectures

Input Manipulation Attack vision

PDF

attack arXiv Dec 11, 2025 · Dec 2025

FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning

Md Nahid Hasan Shuvo, Moinul Hossain, Anik Mallik et al. · George Mason University · Towson University +1 more

Side-channel attack infers FL client model architecture from encrypted Wi-Fi traffic with 98% F1-score

Model Theft federated-learning

PDF

attack arXiv Oct 30, 2025 · Oct 2025

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

Thanh Le, Hai Duong, Yusheng Ji et al. · The Graduate University for Advanced Studies · National Institute of Informatics +2 more

Grey-box attack on DRL-based 5G schedulers uses polytope abstract domains to craft adversarial CSI inputs degrading victim throughput by 70%

Input Manipulation Attack reinforcement-learning

1 citations PDF

attack arXiv Oct 24, 2025 · Oct 2025

NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

Hanyu Zhu, Lance Fiondella, Jiawei Yuan et al. · University of Massachusetts Dartmouth · George Mason University

Neuron-guided genetic attack injects adversarial passages into RAG knowledge bases to override LLM internal memory with 90%+ success

Input Manipulation Attack Prompt Injection nlp

1 citations PDF

defense arXiv Oct 12, 2025 · Oct 2025

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection

Gaojian Wang, Feng Lin, Tong Wu et al. · Zhejiang University · George Mason University

Self-supervised vision foundation model with novel masking objectives for generalizable deepfake, diffusion face, and anti-spoofing detection

Output Integrity Attack vision

PDF Code

benchmark arXiv Sep 12, 2025 · Sep 2025

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Changjia Zhu, Junjie Xiong, Renkai Ma et al. · University of South Florida · Missouri University of Science and Technology +2 more

Evaluates LLM peer reviewer bias and susceptibility to indirect prompt injection via covert instructions embedded in academic paper PDFs

Prompt Injection nlp

PDF

attack arXiv Sep 11, 2025 · Sep 2025

The Coding Limits of Robust Watermarking for Generative Models

Danilo Francati, Yevin Nikhel Goonatilake, Shubham Pawar et al. · Sapienza University of Rome · George Mason University +1 more

Proves binary watermarks for generative models break above 50% bit corruption and demonstrates crop-resize defeats real image watermarking

Output Integrity Attack generativevision

PDF

attack arXiv Aug 8, 2025 · Aug 2025

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation

Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese et al. · RSAC Labs · George Mason University

Attacks LLM-based IT operations agents via adversarial telemetry injection, then proposes sanitization-based defenses against it

Prompt Injection nlp

PDF Code

Latest papers

Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation

ClawSafety: "Safe" LLMs, Unsafe Agents

Verifying DNN-based Semantic Communication Against Generative Adversarial Noise

MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

CodeGuard: Improving LLM Guardrails in CS Education

PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility

FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

The Coding Limits of Robust Watermarking for Generative Models

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue