Varun Notibala

tool arXiv Dec 5, 2025 · Dec 2025

Mahesh Kumar Nandwana, Youngwan Lim, Joseph Liu et al. · Roblox

Deploys a fine-tuned LLM guardrail that detects jailbreaks and harmful content across dynamic, user-defined safety taxonomies

Prompt Injection nlp

Papers in Database (1)