Francesco Giarrusso

h-index: 2 13 citations 5 papers (total)

Papers in Database (3)

attack arXiv Nov 19, 2025 · Nov 2025

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci et al. · DEXAI – Icaro Lab · Sapienza University of Rome +2 more

Adversarial poetry jailbreaks 25 frontier LLMs with 62% average success rate, exposing a universal stylistic bypass of safety alignment

Prompt Injection nlp
9 citations 1 influentialPDF
benchmark arXiv Oct 14, 2025 · Oct 2025

Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection

Francesco Giarrusso, Olga E. Sorokoletova, Vincenzo Suriani et al. · Sapienza University of Rome

Proposes a 7-family jailbreak taxonomy, Italian multi-turn dataset, and GPT-5 detection benchmark for LLM safety

Prompt Injection nlp
2 citations PDF
attack arXiv Dec 16, 2025 · Dec 2025

From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda

Piercosma Bisconti, Marcello Galisai, Matteo Prandi et al. · Sapienza University of Rome · VU Amsterdam +1 more

Novel jailbreak embeds harmful content in cyberpunk tales using Proppian analysis to bypass LLM safety, achieving 71.3% ASR across 26 models

Prompt Injection nlp
1 citations PDF