Daniele Nardi

h-index: 2 16 citations 7 papers (total)

Papers in Database (2)

attack arXiv Nov 19, 2025 · Nov 2025

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci et al. · DEXAI – Icaro Lab · Sapienza University of Rome +2 more

Adversarial poetry jailbreaks 25 frontier LLMs with 62% average success rate, exposing a universal stylistic bypass of safety alignment

Prompt Injection nlp
9 citations 1 influentialPDF
attack arXiv Dec 16, 2025 · Dec 2025

From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda

Piercosma Bisconti, Marcello Galisai, Matteo Prandi et al. · Sapienza University of Rome · VU Amsterdam +1 more

Novel jailbreak embeds harmful content in cyberpunk tales using Proppian analysis to bypass LLM safety, achieving 71.3% ASR across 26 models

Prompt Injection nlp
1 citations PDF