Sungho Keum

h-index: 1 1 citations 2 papers (total)

Papers in Database (1)

attack arXiv Feb 19, 2026 · 6w ago

Discovering Universal Activation Directions for PII Leakage in Language Models

Leo Marchyok, Zachary Coalson, Sungho Keum et al. · Oregon State University · Korea Advanced Institute of Science & Technology

Discovers universal activation directions in LLM residual streams that reliably amplify PII leakage beyond existing prompt-based extraction attacks

Model Inversion Attack Sensitive Information Disclosure nlp
PDF