Ruizhe Li

h-index: 2 10 citations 10 papers (total)

Papers in Database (1)

defense arXiv Feb 21, 2026 ยท 6w ago

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav et al.

Diffusion-based defense projects LLM hidden states onto benign manifolds at inference time to neutralize jailbreak attacks

Input Manipulation Attack Prompt Injection nlp
PDF