Latest papers

1 papers
tool arXiv Dec 5, 2025 · Dec 2025

Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models

Mahesh Kumar Nandwana, Youngwan Lim, Joseph Liu et al. · Roblox

Deploys a fine-tuned LLM guardrail that detects jailbreaks and harmful content across dynamic, user-defined safety taxonomies

Prompt Injection nlp
PDF Code