Thomas Heverin

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

benchmark arXiv Jan 25, 2026 · 10w ago

Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models

Thomas Heverin · The Baldwin School

Benchmarks GPT-4 prompt injection refusal stability, showing one-third of refusals can be bypassed via structured perturbations with artifact type as key predictor

Prompt Injection nlp
PDF