Kevin Zhu

h-index: 4 228 citations 32 papers (total)

Papers in Database (3)

defense arXiv Feb 21, 2026 · 6w ago

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav et al.

Diffusion-based defense projects LLM hidden states onto benign manifolds at inference time to neutralize jailbreak attacks

Input Manipulation Attack Prompt Injection nlp
PDF
defense arXiv Dec 12, 2025 · Dec 2025

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Edward Lue Chee Lip, Anthony Channg, Diana Kim et al. · Algoverse AI Research · Colorado State University +1 more

Evaluates safety protocols for multi-agent LLM systems where an untrusted decomposer can inject malicious subtask instructions undetectable by monitors

Excessive Agency Prompt Injection nlp
PDF Code
defense arXiv Jan 18, 2026 · 11w ago

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma et al. · Algoverse AI Research · University of California

Defends LLM pipelines against prompt injection by detecting semantic embedding drift via cosine similarity, achieving 93%+ accuracy zero-shot

Prompt Injection nlp
PDF Code