Zehua Cheng

h-index: 3 19 citations 7 papers (total)

Papers in Database (1)

defense arXiv Feb 2, 2026 · 9w ago

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment

Zehua Cheng, Jianwei Yang, Wei Dai et al. · University of Oxford · FLock.io +1 more

Proposes certifiably robust LLM jailbreak defense via randomized ablation smoothing, cutting GCG attack success from 84% to 1%

Input Manipulation Attack Prompt Injection nlp
PDF