Latest papers

2 papers
tool arXiv Sep 30, 2025 · Sep 2025

LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models

Guolei Huang, Qinzhi Peng, Gan Xu et al. · Southeast University · RealAI +3 more

Builds a VLM content moderation tool and MCTS red-teaming framework for detecting harmful multi-turn multimodal dialogues

Prompt Injection multimodalnlp
1 citations PDF
defense arXiv Sep 29, 2025 · Sep 2025

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Yichi Zhang, Yue Ding, Jingwen Yang et al. · arXiv · Shanghai Qi Zhi Institute +3 more

Defends Large Reasoning Models against jailbreaks by aligning CoT safety via process-supervised preference optimization with corrective interventions

Prompt Injection nlp
2 citations 1 influentialPDF