Marta Guimaraes

defense arXiv Oct 14, 2025 · Oct 2025

Ruben Belo, Marta Guimaraes, Claudia Soares · Universidade NOVA de Lisboa · Neuraspace

Defends LLMs against jailbreaks by projecting out harmful concept directions from latent representations at inference time

Prompt Injection nlp

Papers in Database (1)