Yaniv Nemcovsky

h-index: 4 32 citations 10 papers (total)

Papers in Database (1)

benchmark arXiv Nov 5, 2025 · Nov 2025

Silenced Biases: The Dark Side LLMs Learned to Refuse

Rom Himelstein, Amit LeVi, Brit Youngmann et al. · Technion - Israel Institute of Technology

Benchmark reveals hidden LLM biases masked by safety alignment using activation steering to bypass refusals

Prompt Injection nlp
2 citations PDF Code