Tianlong Chen

Papers in Database (1)

defense arXiv Jan 5, 2025 · Jan 2025

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense

Yang Ouyang, Hengrui Gu, Shuhang Lin et al. · North Carolina State University · Rutgers University +4 more

Defends LLMs against jailbreaks by identifying harmful-token-generating layers and patching them via adversarial unlearning

Prompt Injection nlp
PDF Code