Avi Mendelson

h-index: 3 32 citations 12 papers (total)

Papers in Database (4)

benchmark arXiv Nov 5, 2025 · Nov 2025

Silenced Biases: The Dark Side LLMs Learned to Refuse

Rom Himelstein, Amit LeVi, Brit Youngmann et al. · Technion - Israel Institute of Technology

Benchmark reveals hidden LLM biases masked by safety alignment using activation steering to bypass refusals

Prompt Injection nlp
2 citations PDF Code
benchmark arXiv Nov 6, 2025 · Nov 2025

REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

Liran Cohen, Yaniv Nemcovesky, Avi Mendelson · Technion - Israel Institute of Technology

Neighborhood loss-landscape analysis reveals residual memorization in unlearned LLMs, outperforming existing black-box membership inference methods

Membership Inference Attack nlp
PDF
defense arXiv Feb 1, 2026 · 9w ago

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein et al. · Technion - Israel Institute of Technology · Ben-Gurion University of the Negev +1 more

Defends AR and diffusion LLMs against jailbreaks via SRI signal detecting incomplete internal recovery with 100× lower overhead

Prompt Injection nlp
PDF Code
attack arXiv Feb 2, 2026 · 9w ago

Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

Tomer Kordonsky, Maayan Yamin, Noam Benzimra et al. · Technion -- Israel Institute of Technology · Zenity

Exploits LLM code-generation template recurrence to predict hidden backend vulnerabilities from observable frontend features in a black-box attack

Sensitive Information Disclosure nlp
PDF Code