ML Security Papers

Latest papers

3 papers

attack arXiv Dec 7, 2025 · Dec 2025

Songping Wang, Rufan Qian, Yueming Lyu et al. · Nanjing University · Meituan +1 more

Self-evolving RL+LLM jailbreak framework for Image-to-Video models outperforms baselines by up to 79% via coordinated text-image attacks

Prompt Injection multimodalgenerativevisionnlp

2 citations PDF

attack arXiv Nov 1, 2025 · Nov 2025

Peng Ding, Jun Kuang, Wen Sun et al. · Nanjing University · Meituan

Jailbreaks LLMs via minimal intent-shifting text edits, bypassing safety filters with natural human-readable prompts

Prompt Injection nlp

attack arXiv Sep 18, 2025 · Sep 2025

Siyu Yan, Long Zeng, Xuecheng Wu et al. · East China Normal University · Xi’an Jiaotong University +2 more

Attacks multi-turn LLM safety via MCTS-guided frame semantic trajectories; defends with early-intervention dialogue alignment

Prompt Injection nlp