Chen Henry Wu

attack arXiv Nov 5, 2025 · Nov 2025

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena et al. · Carnegie Mellon University

NINJA jailbreaks long-context LLMs by burying harmful goals in benign haystack content, exploiting positional safety blindspots

Prompt Injection nlp

2 citations PDF

Papers in Database (1)