Itay Zloczower

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

defense arXiv Jan 27, 2026 · 9w ago

GAVEL: Towards rule-based safety through activation monitoring

Shir Rozenfeld, Rahul Pankajakshan, Itay Zloczower et al. · Ben Gurion University of the Negev · Amrita Vishwa Vidyapeetham

Rule-based LLM safety framework using interpretable activation-level cognitive elements to detect harmful behaviors with high precision and auditability

Prompt Injection nlp
PDF