ML Security Papers

Latest papers

5 papers

attack arXiv Jan 6, 2026 · Jan 2026

Yuetian Chen, Yuntao Du, Kaiyuan Zhang et al. · Purdue University · Cisco Research +1 more

Sliding-window MIA against fine-tuned LLMs captures localized memorization signals, achieving 2-3x better detection than global-loss baselines

Membership Inference Attack nlp

defense arXiv Jan 5, 2026 · Jan 2026

Neusha Javidnia, Ruisi Zhang, Ashish Kundu et al. · University of California · Cisco Research

RL-trained LoRA adapters embed detectable watermarks in code LLM outputs, resisting refactoring and adversarial removal attacks

Output Integrity Attack nlp

attack arXiv Nov 9, 2025 · Nov 2025

Haiyan Zhao, Zirui He, Fan Yang et al. · New Jersey Institute of Technology · Wake Forest University +1 more

Inverts LLM last-token representations to reconstruct original input text, recovering over half of 16-token sequence information

Model Inversion Attack Sensitive Information Disclosure nlp

attack EMNLP Oct 27, 2025 · Oct 2025

Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen et al. · Virginia Tech · Cisco Research

Extracts verbatim LLM training data by optimizing prompts to spike token entropy, achieving 22% extraction rate on Llama 2-70B

Model Inversion Attack Sensitive Information Disclosure nlp

defense arXiv Oct 22, 2025 · Oct 2025

Hanbin Hong, Ashish Kundu, Ali Payani et al. · University of Connecticut · Cisco Research +1 more

Certified adversarial defense using anisotropic randomized smoothing that outperforms isotropic baselines by up to 182.6% on certified accuracy

Input Manipulation Attack vision