Alexander Robey

h-index: 21 3,804 citations 45 papers (total)

Papers in Database (2)

attack arXiv Nov 5, 2025 · Nov 2025

Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena et al. · Carnegie Mellon University

NINJA jailbreaks long-context LLMs by burying harmful goals in benign haystack content, exploiting positional safety blindspots

Prompt Injection nlp

2 citations PDF

defense arXiv Sep 23, 2025 · Sep 2025

Alexander Robey · University of Pennsylvania

PhD thesis proposing new adversarial robustness algorithms for vision models and LLM jailbreak attacks and defenses

Input Manipulation Attack Prompt Injection visionnlp

1 citations PDF