ML Security Papers

Latest papers

2 papers

attack arXiv Oct 24, 2025 · Oct 2025

Kieu Dang, Phung Lai, NhatHai Phan et al. · University at Albany · New Jersey Institute of Technology +2 more

LDP noise injection during fine-tuning steals LLM behavior from APIs while evading watermark detectors, achieving 96.95% attack success rate

Model Theft Output Integrity Attack Model Theft nlp

2 citations PDF Code

defense arXiv Aug 19, 2025 · Aug 2025

Jiaming Hu, Haoyu Wang, Debarghya Mukherjee et al. · University at Albany · Boston University

Dual-track prompt-level defense isolates query semantic cores to neutralize LLM jailbreaks including GCG and DeepInception

Input Manipulation Attack Prompt Injection nlp