Federico Cinus

defense arXiv Dec 7, 2025 · Dec 2025

Jehyeok Yeon, Federico Cinus, Yifan Wu et al. · University of Illinois Urbana-Champaign · University of Southern California +1 more

Proposes graph-regularized sparse autoencoders to capture distributed LLM safety representations for adaptive jailbreak defense with 82% refusal rate

Prompt Injection nlp

1 citations PDF

Papers in Database (1)