David D. Baek

h-index: 4 75 citations 9 papers (total)

Papers in Database (1)

defense arXiv Oct 20, 2025 · Oct 2025

Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Jiawei Zhang, Andrew Estornell, David D. Baek et al. · ByteDance · University of Chicago +2 more

Inference-time defense reintroducing alignment tokens mid-generation to block jailbreaks and adversarial prefill attacks in LLMs

Input Manipulation Attack Prompt Injection nlp
PDF