Latest papers

10 papers
tool arXiv Feb 25, 2026 · 5w ago

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren et al. · Cisco · OWASP +1 more

Open-source scanner (hubscan) detecting adversarially crafted hub documents injected into RAG vector databases to hijack LLM context

Data Poisoning Attack Prompt Injection nlpmultimodal
PDF Code
defense arXiv Feb 18, 2026 · 6w ago

Protecting the Undeleted in Machine Unlearning

Aloni Cohen, Refael Kohen, Kobbi Nissim et al. · University of Chicago · Tel Aviv University +1 more

Demonstrates that perfect-retraining unlearning leaks undeleted users' data; proposes new security definition to prevent reconstruction attacks via deletion requests

Model Inversion Attack
PDF
survey arXiv Jan 14, 2026 · 11w ago

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

Oleg Brodt, Elad Feldman, Bruce Schneier et al. · Ben-Gurion University of the Negev · Tel Aviv University +2 more

Surveys 36 LLM attack incidents and proposes a seven-stage promptware kill chain mapping prompt injection to multi-step malware delivery

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Nov 26, 2025 · Nov 2025

SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection

Ido Nitzan HIdekel, Gal lifshitz, Khen Cohen et al. · Tel Aviv University

Novel frequency-contrastive framework detects deepfake audio by disentangling and aligning high-frequency residuals via cross-attention and JS contrastive loss

Output Integrity Attack audio
PDF
defense arXiv Nov 15, 2025 · Nov 2025

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Gil Goren, Shahar Katz, Lior Wolf · Tel Aviv University

Defends LLMs against jailbreaks by monitoring internal activations with a random forest combining refusal direction and SVM signals

Prompt Injection nlp
1 citations PDF Code
attack TPS-ISA Oct 21, 2025 · Oct 2025

Exploring Membership Inference Vulnerabilities in Clinical Large Language Models

Alexander Nemecek, Zebin Yun, Zahra Rahmani et al. · Case Western Reserve University · Tel Aviv University

Evaluates membership inference attacks on clinical LLMs fine-tuned on EHR data using loss-based and paraphrase-perturbation methods

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Oct 15, 2025 · Oct 2025

NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

Nir Goren, Oren Katzir, Abhinav Nakarmi et al. · Tel Aviv University · University of Michigan

Distortion-free diffusion watermarking exploits seed-output correlation and ZK proofs to verify generated image authorship without model weights

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Sep 25, 2025 · Sep 2025

No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks

Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum et al. · Tel Aviv University · Ben-Gurion University of the Negev +1 more

Theoretically proves reconstruction attacks on neural networks are fundamentally unreliable without prior data knowledge, and that better-trained models leak less

Model Inversion Attack vision
PDF
defense arXiv Sep 16, 2025 · Sep 2025

Sy-FAR: Symmetry-based Fair Adversarial Robustness

Haneen Najjar, Eyal Ronen, Mahmood Sharif · Tel Aviv University

Defends face recognition against adversarial attacks by enforcing symmetry in inter-class attack success rates rather than perfect fairness parity

Input Manipulation Attack vision
PDF
attack arXiv Aug 16, 2025 · Aug 2025

Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous

Ben Nassi, Stav Cohen, Or Yair · Tel Aviv University · Technion - Israel Institute of Technology +1 more

Indirect prompt injection via calendar invites and emails hijacks Gemini assistants to exfiltrate data, spam contacts, and control IoT devices

Prompt Injection Excessive Agency nlp
PDF