Chen Henry Wu

h-index: 2 42 citations 6 papers (total)

Papers in Database (1)

attack arXiv Nov 5, 2025 · Nov 2025

Jailbreaking in the Haystack

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena et al. · Carnegie Mellon University

NINJA jailbreaks long-context LLMs by burying harmful goals in benign haystack content, exploiting positional safety blindspots

Prompt Injection nlp
2 citations PDF