Yejin Lee

Papers in Database (1)

attack arXiv Apr 2, 2026 · 4d ago

CRaFT: Circuit-Guided Refusal Feature Selection via Cross-Layer Transcoders

Su-Hyeon Kim, Hyundong Jin, Yejin Lee et al. · Yonsei University

Circuit-guided feature selection for LLM jailbreaking that identifies causal refusal features via cross-layer transcoders and boundary prompts

Prompt Injection nlp
PDF