Hoang Phan

defense EMNLP Sep 29, 2025 · Sep 2025

Hoang Phan, Victor Li, Qi Lei · New York University

Inference-time jailbreak defense using progressive self-reflection reduces LLM attack success rates from ~80% to under 6%

Prompt Injection nlp

1 citations PDF

Papers in Database (1)