Latest papers

3 papers
attack arXiv Dec 7, 2025 · Dec 2025

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Songping Wang, Rufan Qian, Yueming Lyu et al. · Nanjing University · Meituan +1 more

Self-evolving RL+LLM jailbreak framework for Image-to-Video models outperforms baselines by up to 79% via coordinated text-image attacks

Prompt Injection multimodalgenerativevisionnlp
2 citations PDF
attack arXiv Nov 1, 2025 · Nov 2025

Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack

Peng Ding, Jun Kuang, Wen Sun et al. · Nanjing University · Meituan

Jailbreaks LLMs via minimal intent-shifting text edits, bypassing safety filters with natural human-readable prompts

Prompt Injection nlp
PDF Code
attack arXiv Sep 18, 2025 · Sep 2025

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models

Siyu Yan, Long Zeng, Xuecheng Wu et al. · East China Normal University · Xi’an Jiaotong University +2 more

Attacks multi-turn LLM safety via MCTS-guided frame semantic trajectories; defends with early-intervention dialogue alignment

Prompt Injection nlp
PDF Code