Alexander Robey

h-index: 21 3,804 citations 45 papers (total)

Papers in Database (2)

attack arXiv Nov 5, 2025 · Nov 2025

Jailbreaking in the Haystack

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena et al. · Carnegie Mellon University

NINJA jailbreaks long-context LLMs by burying harmful goals in benign haystack content, exploiting positional safety blindspots

Prompt Injection nlp
2 citations PDF
defense arXiv Sep 23, 2025 · Sep 2025

Algorithms for Adversarially Robust Deep Learning

Alexander Robey · University of Pennsylvania

PhD thesis proposing new adversarial robustness algorithms for vision models and LLM jailbreak attacks and defenses

Input Manipulation Attack Prompt Injection visionnlp
1 citations PDF