defense 2025

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

Boyu Zhu 1, Xiaofei Wen 2, Wenjie Jacky Mo 2, Tinghui Zhu 2, Yanan Xie 3, Peng Qi 3, Muhao Chen 2

0 citations · 63 references · arXiv

α

Published on arXiv

2512.02306

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

OmniGuard achieves strong effectiveness and generalization across 15 multimodal safety benchmarks, outperforming prior unimodal guardrail systems across text, image, video, and audio modalities.

OmniGuard

Novel technique introduced


Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate reasoning ability. To support the training of OMNIGUARD, we curate a large, comprehensive omni-modal safety dataset comprising over 210K diverse samples, with inputs that cover all modalities through both unimodal and cross-modal samples. Each sample is annotated with structured safety labels and carefully curated safety critiques from expert models through targeted distillation. Extensive experiments on 15 benchmarks show that OmniGuard achieves strong effectiveness and generalization across a wide range of multimodal safety scenarios. Importantly, OmniGuard provides a unified framework that enforces policies and mitigates risks in omni-modalities, paving the way toward building more robust and capable omnimodal safeguarding systems.


Key Contributions

  • First family of omni-modal guardrail models (OmniGuard) that perform safeguarding across text, image, video, and audio with deliberate chain-of-thought reasoning rather than binary classification
  • Large-scale omni-modal safety dataset of 210K+ samples with structured safety labels and critiques distilled from expert models, covering unimodal and cross-modal inputs
  • Unified omni-modal safety framework evaluated on 15 benchmarks demonstrating strong effectiveness and generalization across multimodal safety scenarios

🛡️ Threat Analysis


Details

Domains
nlpvisionaudiomultimodal
Model Types
llmvlmmultimodal
Threat Tags
inference_time
Datasets
OmniGuard safety dataset (210K+ samples, curated)15 multimodal safety benchmarks (unspecified in excerpt)
Applications
multimodal llm safetycontent moderationomni-modal guardrails