ML Security Papers

defense arXiv Feb 26, 2026 · 5w ago

CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety

Umid Suleymanov, Rufiz Bayramov, Suad Gafarli et al. · Virginia Tech · ADA University

Retrieval-augmented multi-agent framework enforces LLM safety policies via adversarial debate without fine-tuning, generalizing zero-shot to new governance rules

Prompt Injection nlp

PDF

Latest papers

CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue