defense arXiv Feb 3, 2026 · 8w ago
Zhenhao Zhu, Yue Liu, Yanpei Guo et al. · Tsinghua University · National University of Singapore +2 more
Reasoning-based omni-modal guardrail using SFT+GRPO to detect harmful text, image, and video LLM outputs
Prompt Injection multimodalnlpvision
We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
vlm llm multimodal Tsinghua University · National University of Singapore · Sun Yat-Sen University +1 more