Michael Karasik

h-index: 9 390 citations 37 papers (total)

Papers in Database (1)

attack arXiv Dec 3, 2025 · Dec 2025

In-Context Representation Hijacking

Itay Yona, Amir Sarid, Michael Karasik et al. · MentaLeap · Independent Researcher +1 more

Jailbreaks LLMs by replacing harmful keywords with benign substitutes in-context, hijacking internal representations to bypass safety alignment

Prompt Injection nlp
PDF Code