Chaim Baskin

h-index: 14 1,075 citations 65 papers (total)

Papers in Database (1)

defense arXiv Feb 1, 2026 · 9w ago

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein et al. · Technion - Israel Institute of Technology · Ben-Gurion University of the Negev +1 more

Defends AR and diffusion LLMs against jailbreaks via SRI signal detecting incomplete internal recovery with 100× lower overhead

Prompt Injection nlp
PDF Code