Matteo Prandi

Papers in Database (1)

benchmark arXiv Apr 20, 2026 · 4w ago

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

Marcello Galisai, Susanna Cifani, Francesco Giarrusso et al. · Sapienza University of Rome · DEXAI +1 more

Benchmark showing 55.75% jailbreak success across 31 LLMs using humanities-style prompt obfuscation to evade safety guardrails

Prompt Injection nlp
PDF Code