ML Security Papers

Latest papers

10 papers

tool arXiv Feb 25, 2026 · 5w ago

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren et al. · Cisco · OWASP +1 more

Open-source scanner (hubscan) detecting adversarially crafted hub documents injected into RAG vector databases to hijack LLM context

Data Poisoning Attack Prompt Injection nlpmultimodal

PDF Code

defense arXiv Feb 18, 2026 · 6w ago

Protecting the Undeleted in Machine Unlearning

Aloni Cohen, Refael Kohen, Kobbi Nissim et al. · University of Chicago · Tel Aviv University +1 more

Demonstrates that perfect-retraining unlearning leaks undeleted users' data; proposes new security definition to prevent reconstruction attacks via deletion requests

Model Inversion Attack

PDF

survey arXiv Jan 14, 2026 · 11w ago

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

Oleg Brodt, Elad Feldman, Bruce Schneier et al. · Ben-Gurion University of the Negev · Tel Aviv University +2 more

Surveys 36 LLM attack incidents and proposes a seven-stage promptware kill chain mapping prompt injection to multi-step malware delivery

Prompt Injection Excessive Agency nlp

PDF

defense arXiv Nov 26, 2025 · Nov 2025

SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection

Ido Nitzan HIdekel, Gal lifshitz, Khen Cohen et al. · Tel Aviv University

Novel frequency-contrastive framework detects deepfake audio by disentangling and aligning high-frequency residuals via cross-attention and JS contrastive loss

Output Integrity Attack audio

PDF

defense arXiv Nov 15, 2025 · Nov 2025

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Gil Goren, Shahar Katz, Lior Wolf · Tel Aviv University

Defends LLMs against jailbreaks by monitoring internal activations with a random forest combining refusal direction and SVM signals

Prompt Injection nlp

1 citations PDF Code

attack TPS-ISA Oct 21, 2025 · Oct 2025

Exploring Membership Inference Vulnerabilities in Clinical Large Language Models

Alexander Nemecek, Zebin Yun, Zahra Rahmani et al. · Case Western Reserve University · Tel Aviv University

Evaluates membership inference attacks on clinical LLMs fine-tuned on EHR data using loss-based and paraphrase-perturbation methods

Membership Inference Attack Sensitive Information Disclosure nlp

PDF

defense arXiv Oct 15, 2025 · Oct 2025

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

Nir Goren, Oren Katzir, Abhinav Nakarmi et al. · Tel Aviv University · University of Michigan

Distortion-free diffusion watermarking exploits seed-output correlation and ZK proofs to verify generated image authorship without model weights

Output Integrity Attack visiongenerative

PDF

benchmark arXiv Sep 25, 2025 · Sep 2025

No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks

Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum et al. · Tel Aviv University · Ben-Gurion University of the Negev +1 more

Theoretically proves reconstruction attacks on neural networks are fundamentally unreliable without prior data knowledge, and that better-trained models leak less

Model Inversion Attack vision

PDF

defense arXiv Sep 16, 2025 · Sep 2025

Sy-FAR: Symmetry-based Fair Adversarial Robustness

Haneen Najjar, Eyal Ronen, Mahmood Sharif · Tel Aviv University

Defends face recognition against adversarial attacks by enforcing symmetry in inter-class attack success rates rather than perfect fairness parity

Input Manipulation Attack vision

PDF

attack arXiv Aug 16, 2025 · Aug 2025

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous

Ben Nassi, Stav Cohen, Or Yair · Tel Aviv University · Technion - Israel Institute of Technology +1 more

Indirect prompt injection via calendar invites and emails hijacks Gemini assistants to exfiltrate data, spam contacts, and control IoT devices

Prompt Injection Excessive Agency nlp

PDF

Latest papers

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Protecting the Undeleted in Machine Unlearning

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Exploring Membership Inference Vulnerabilities in Clinical Large Language Models

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks

Sy-FAR: Symmetry-based Fair Adversarial Robustness

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue