Michael Karasik

attack arXiv Dec 3, 2025 · Dec 2025

Itay Yona, Amir Sarid, Michael Karasik et al. · MentaLeap · Independent Researcher +1 more

Jailbreaks LLMs by replacing harmful keywords with benign substitutes in-context, hijacking internal representations to bypass safety alignment

Prompt Injection nlp

Papers in Database (1)