Pavlos Ntais

h-index: 1 1 citations 1 papers (total)

Papers in Database (1)

attack arXiv Oct 24, 2025 · Oct 2025

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Pavlos Ntais · University of Athens

Trains compact LoRA-tuned Mistral-7B to auto-generate narrative jailbreaks, achieving 81% ASR against GPT-OSS-20B and 66.5% against GPT-4

Prompt Injection nlp
1 citations PDF