ML Security Papers

Latest papers

6 papers

defense arXiv Feb 4, 2026 · 8w ago

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

Sagie Dekel, Moshe Tennenholtz, Oren Kurland · Technion - Israel Institute of Technology

Sparse block-attention mechanism (SDAG) prevents cross-document interactions in RAG to defend against corpus poisoning attacks on LLMs

Input Manipulation Attack Prompt Injection nlp

PDF

defense arXiv Feb 1, 2026 · 9w ago

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein et al. · Technion - Israel Institute of Technology · Ben-Gurion University of the Negev +1 more

Defends AR and diffusion LLMs against jailbreaks via SRI signal detecting incomplete internal recovery with 100× lower overhead

Prompt Injection nlp

PDF Code

defense arXiv Nov 13, 2025 · Nov 2025

Tight Robustness Certification through the Convex Hull of $\ell_0$ Attacks

Yuval Shapira, Dana Drachsler-Cohen · Technion - Israel Institute of Technology

Certifies ℓ₀ adversarial robustness 3x faster via tighter convex hull bound propagation for few-pixel attacks

Input Manipulation Attack vision

PDF Code

benchmark arXiv Nov 6, 2025 · Nov 2025

REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

Liran Cohen, Yaniv Nemcovesky, Avi Mendelson · Technion - Israel Institute of Technology

Neighborhood loss-landscape analysis reveals residual memorization in unlearned LLMs, outperforming existing black-box membership inference methods

Membership Inference Attack nlp

PDF

benchmark arXiv Nov 5, 2025 · Nov 2025

Silenced Biases: The Dark Side LLMs Learned to Refuse

Rom Himelstein, Amit LeVi, Brit Youngmann et al. · Technion - Israel Institute of Technology

Benchmark reveals hidden LLM biases masked by safety alignment using activation steering to bypass refusals

Prompt Injection nlp

2 citations PDF Code

attack arXiv Aug 16, 2025 · Aug 2025

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous

Ben Nassi, Stav Cohen, Or Yair · Tel Aviv University · Technion - Israel Institute of Technology +1 more

Indirect prompt injection via calendar invites and emails hijacks Gemini assistants to exfiltrate data, spam contacts, and control IoT devices

Prompt Injection Excessive Agency nlp

PDF

Latest papers

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Tight Robustness Certification through the Convex Hull of $\ell_0$ Attacks

REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

Silenced Biases: The Dark Side LLMs Learned to Refuse

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue