Nils Durner

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

benchmark arXiv Sep 25, 2025 ยท Sep 2025

In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b

Nils Durner

Systematically measures sociopragmatic jailbreaks and RAG context exfiltration in gpt-oss-20b, achieving 97.5% bypass on ZIP-bomb tasks

Prompt Injection Sensitive Information Disclosure nlp
PDF Code