Latest papers

6 papers
defense arXiv Feb 4, 2026 · 8w ago

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

Sagie Dekel, Moshe Tennenholtz, Oren Kurland · Technion - Israel Institute of Technology

Sparse block-attention mechanism (SDAG) prevents cross-document interactions in RAG to defend against corpus poisoning attacks on LLMs

Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Feb 1, 2026 · 9w ago

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein et al. · Technion - Israel Institute of Technology · Ben-Gurion University of the Negev +1 more

Defends AR and diffusion LLMs against jailbreaks via SRI signal detecting incomplete internal recovery with 100× lower overhead

Prompt Injection nlp
PDF Code
defense arXiv Nov 13, 2025 · Nov 2025

Tight Robustness Certification through the Convex Hull of $\ell_0$ Attacks

Yuval Shapira, Dana Drachsler-Cohen · Technion - Israel Institute of Technology

Certifies ℓ₀ adversarial robustness 3x faster via tighter convex hull bound propagation for few-pixel attacks

Input Manipulation Attack vision
PDF Code
benchmark arXiv Nov 6, 2025 · Nov 2025

REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

Liran Cohen, Yaniv Nemcovesky, Avi Mendelson · Technion - Israel Institute of Technology

Neighborhood loss-landscape analysis reveals residual memorization in unlearned LLMs, outperforming existing black-box membership inference methods

Membership Inference Attack nlp
PDF
benchmark arXiv Nov 5, 2025 · Nov 2025

Silenced Biases: The Dark Side LLMs Learned to Refuse

Rom Himelstein, Amit LeVi, Brit Youngmann et al. · Technion - Israel Institute of Technology

Benchmark reveals hidden LLM biases masked by safety alignment using activation steering to bypass refusals

Prompt Injection nlp
2 citations PDF Code
attack arXiv Aug 16, 2025 · Aug 2025

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous

Ben Nassi, Stav Cohen, Or Yair · Tel Aviv University · Technion - Israel Institute of Technology +1 more

Indirect prompt injection via calendar invites and emails hijacks Gemini assistants to exfiltrate data, spam contacts, and control IoT devices

Prompt Injection Excessive Agency nlp
PDF