Fatmazohra Rezkellah

defense arXiv Oct 3, 2025 · Oct 2025

Fatmazohra Rezkellah, Ramzi Dakhmouche · Université Paris-Dauphine · EPFL +1 more

Defends LLMs against jailbreaking and unlearns sensitive content via minimal constrained weight interventions, no classifier required

Prompt Injection Sensitive Information Disclosure nlp

2 citations PDF

Papers in Database (1)