ML Security Papers

Latest papers

2 papers

tool arXiv Sep 30, 2025 · Sep 2025

Guolei Huang, Qinzhi Peng, Gan Xu et al. · Southeast University · RealAI +3 more

Builds a VLM content moderation tool and MCTS red-teaming framework for detecting harmful multi-turn multimodal dialogues

Prompt Injection multimodalnlp

1 citations PDF

defense arXiv Sep 29, 2025 · Sep 2025

Yichi Zhang, Yue Ding, Jingwen Yang et al. · arXiv · Shanghai Qi Zhi Institute +3 more

Defends Large Reasoning Models against jailbreaks by aligning CoT safety via process-supervised preference optimization with corrective interventions

Prompt Injection nlp

2 citations 1 influentialPDF