Latest papers

6 papers
tool arXiv Feb 10, 2026 · 7w ago

Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction

Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli et al. · University of Cincinnati

Two-layer jailbreak detector using BERT-extracted linguistic features to catch unsafe prompt manipulation in clinical training LLMs

Prompt Injection nlp
PDF
benchmark arXiv Feb 2, 2026 · 9w ago

A Comparative Study of Adversarial Robustness in CNN and CNN-ANFIS Architectures

Kaaustaaub Shankar, Bharadwaj Dogga, Kelly Cohen · University of Cincinnati

Benchmarks adversarial robustness of neuro-fuzzy CNN hybrids under PGD and Square attacks, finding architecture-dependent effects

Input Manipulation Attack vision
PDF
benchmark arXiv Jan 19, 2026 · 11w ago

Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Daniel Vennemeyer, Punya Syon Pandey, Phan Anh Duong et al. · University of Cincinnati · University of Toronto +1 more

Compares six LLM fine-tuning objectives and finds ORPO and KL-regularization best preserve jailbreak resistance and alignment at scale

Transfer Learning Attack Prompt Injection nlp
PDF
survey arXiv Sep 14, 2025 · Sep 2025

Membership Inference Attacks on Recommender System: A Survey

Jiajie He, Xintong Chen, Xinyang Fang et al. · University of Maryland · University of Cincinnati +1 more

Surveys membership inference attacks on recommender systems, covering taxonomy, attack designs, defenses, and future directions

Membership Inference Attack nlpgraph
PDF
benchmark arXiv Sep 12, 2025 · Sep 2025

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Changjia Zhu, Junjie Xiong, Renkai Ma et al. · University of South Florida · Missouri University of Science and Technology +2 more

Evaluates LLM peer reviewer bias and susceptibility to indirect prompt injection via covert instructions embedded in academic paper PDFs

Prompt Injection nlp
PDF
attack arXiv Aug 26, 2025 · Aug 2025

Membership Inference Attacks on LLM-based Recommender Systems

Jiajie He, Min-Chun Chen, Xintong Chen et al. · University of Maryland · University of Cincinnati +1 more

Designs four MIA attacks to infer private user history from LLM recommender system prompts via similarity, memorization, inquiry, and poisoning.

Membership Inference Attack Sensitive Information Disclosure nlp
PDF