Zehua Cheng

defense arXiv Feb 2, 2026 · 9w ago

Zehua Cheng, Jianwei Yang, Wei Dai et al. · University of Oxford · FLock.io +1 more

Proposes certifiably robust LLM jailbreak defense via randomized ablation smoothing, cutting GCG attack success from 84% to 1%

Input Manipulation Attack Prompt Injection nlp

Papers in Database (1)