Latest papers

1 papers
benchmark arXiv Jan 25, 2026 · 10w ago

Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models

Thomas Heverin · The Baldwin School

Benchmarks GPT-4 prompt injection refusal stability, showing one-third of refusals can be bypassed via structured perturbations with artifact type as key predictor

Prompt Injection nlp
PDF