ML Security Papers

Latest papers

6 papers

tool arXiv Feb 10, 2026 · 7w ago

Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction

Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli et al. · University of Cincinnati

Two-layer jailbreak detector using BERT-extracted linguistic features to catch unsafe prompt manipulation in clinical training LLMs

Prompt Injection nlp

PDF

benchmark arXiv Feb 2, 2026 · 9w ago

A Comparative Study of Adversarial Robustness in CNN and CNN-ANFIS Architectures

Kaaustaaub Shankar, Bharadwaj Dogga, Kelly Cohen · University of Cincinnati

Benchmarks adversarial robustness of neuro-fuzzy CNN hybrids under PGD and Square attacks, finding architecture-dependent effects

Input Manipulation Attack vision

PDF

benchmark arXiv Jan 19, 2026 · 11w ago

Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Daniel Vennemeyer, Punya Syon Pandey, Phan Anh Duong et al. · University of Cincinnati · University of Toronto +1 more

Compares six LLM fine-tuning objectives and finds ORPO and KL-regularization best preserve jailbreak resistance and alignment at scale

Transfer Learning Attack Prompt Injection nlp

PDF

survey arXiv Sep 14, 2025 · Sep 2025

Membership Inference Attacks on Recommender System: A Survey

Jiajie He, Xintong Chen, Xinyang Fang et al. · University of Maryland · University of Cincinnati +1 more

Surveys membership inference attacks on recommender systems, covering taxonomy, attack designs, defenses, and future directions

Membership Inference Attack nlpgraph

PDF

benchmark arXiv Sep 12, 2025 · Sep 2025

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Changjia Zhu, Junjie Xiong, Renkai Ma et al. · University of South Florida · Missouri University of Science and Technology +2 more

Evaluates LLM peer reviewer bias and susceptibility to indirect prompt injection via covert instructions embedded in academic paper PDFs

Prompt Injection nlp

PDF

attack arXiv Aug 26, 2025 · Aug 2025

Membership Inference Attacks on LLM-based Recommender Systems

Jiajie He, Min-Chun Chen, Xintong Chen et al. · University of Maryland · University of Cincinnati +1 more

Designs four MIA attacks to infer private user history from LLM recommender system prompts via similarity, memorization, inquiry, and poisoning.

Membership Inference Attack Sensitive Information Disclosure nlp

PDF

Latest papers

Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction

A Comparative Study of Adversarial Robustness in CNN and CNN-ANFIS Architectures

Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Membership Inference Attacks on Recommender System: A Survey

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Membership Inference Attacks on LLM-based Recommender Systems

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue