ML Security Papers

Latest papers

2 papers

attack arXiv Oct 23, 2025 · Oct 2025

Zheng-Xin Yong, Stephen H. Bach · Brown University

Discovers reasoning LLMs self-jailbreak via chain-of-thought after benign math/code fine-tuning, despite recognizing harmful requests

Transfer Learning Attack Prompt Injection nlp

tool arXiv Aug 21, 2025 · Aug 2025

Andreas D. Kellas, Neophytos Christou, Wenxin Jiang et al. · Columbia University · Brown University +4 more

Defends against malicious pickle-based ML models on Hugging Face via static analysis and dynamic policy enforcement at load time

AI Supply Chain Attacks