Ziqian Zhong

h-index: 3 24 citations 6 papers (total)

Papers in Database (1)

attack arXiv Nov 5, 2025 · Nov 2025

Jailbreaking in the Haystack

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena et al. · Carnegie Mellon University

NINJA jailbreaks long-context LLMs by burying harmful goals in benign haystack content, exploiting positional safety blindspots

Prompt Injection nlp
2 citations PDF