Yisroel Mirsky

attack arXiv Feb 5, 2026 · 8w ago

LeakBoost: Perceptual-Loss-Based Membership Inference Attack

Amit Kravchik Taub, Fred M. Grabovski, Guy Amit et al. · Ben-Gurion University

Boosts membership inference attacks by synthesizing activation-space interrogation images, raising AUC from near-chance to 0.81–0.88

Membership Inference Attack vision

PDF

defense arXiv Jan 27, 2026 · 9w ago

GAVEL: Towards rule-based safety through activation monitoring

Shir Rozenfeld, Rahul Pankajakshan, Itay Zloczower et al. · Ben Gurion University of the Negev · Amrita Vishwa Vidyapeetham

Rule-based LLM safety framework using interpretable activation-level cognitive elements to detect harmful behaviors with high precision and auditability

Prompt Injection nlp

PDF

Papers in Database (2)

LeakBoost: Perceptual-Loss-Based Membership Inference Attack

GAVEL: Towards rule-based safety through activation monitoring