Kalina Bontcheva

benchmark arXiv Oct 14, 2025 · Oct 2025

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

João A. Leite, Arnav Arora, Silvia Gargova et al. · University of Sheffield · University of Copenhagen +2 more

Red-teams 8 LLMs with persona-targeted disinformation prompts across 4 languages, finding jailbreak rates rise up to 10 percentage points with simple personalisation

Prompt Injection nlp

1 citations 1 influentialPDF

attack arXiv Jan 23, 2026 · 10w ago

LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems

João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva et al. · University of Sheffield

Attacks fact-checking classifiers with LLM-generated persuasive claim rewrites, collapsing accuracy to near-zero via 15 persuasion techniques

Input Manipulation Attack nlp

PDF

Papers in Database (2)

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems