Lior Wolf

h-index: 3 42 citations 8 papers (total)

Papers in Database (1)

defense arXiv Nov 15, 2025 · Nov 2025

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Gil Goren, Shahar Katz, Lior Wolf · Tel Aviv University

Defends LLMs against jailbreaks by monitoring internal activations with a random forest combining refusal direction and SVM signals

Prompt Injection nlp
1 citations PDF Code