Eric Easley

Papers in Database (1)

defense arXiv Apr 12, 2026 · 3d ago

Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs

Eric Easley, Sebastian Farquhar · University of California · University of Oxford

Defense training LLMs to reinterpret malicious instructions as benign at the representation level, blocking jailbreaks and backdoors

Model Poisoning Prompt Injection Sensitive Information Disclosure nlp
PDF