Federico Cinus

h-index: 5 154 citations 17 papers (total)

Papers in Database (1)

defense arXiv Dec 7, 2025 · Dec 2025

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

Jehyeok Yeon, Federico Cinus, Yifan Wu et al. · University of Illinois Urbana-Champaign · University of Southern California +1 more

Proposes graph-regularized sparse autoencoders to capture distributed LLM safety representations for adaptive jailbreak defense with 82% refusal rate

Prompt Injection nlp
1 citations PDF