Arnav Arora

benchmark arXiv Oct 14, 2025 · Oct 2025

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

João A. Leite, Arnav Arora, Silvia Gargova et al. · University of Sheffield · University of Copenhagen +2 more

Red-teams 8 LLMs with persona-targeted disinformation prompts across 4 languages, finding jailbreak rates rise up to 10 percentage points with simple personalisation

Prompt Injection nlp

1 citations 1 influentialPDF

Papers in Database (1)

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation