Varun Notibala

h-index: 1 89 citations 2 papers (total)

Papers in Database (1)

tool arXiv Dec 5, 2025 · Dec 2025

Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models

Mahesh Kumar Nandwana, Youngwan Lim, Joseph Liu et al. · Roblox

Deploys a fine-tuned LLM guardrail that detects jailbreaks and harmful content across dynamic, user-defined safety taxonomies

Prompt Injection nlp
PDF Code