Kamilė Lukošiūtė

h-index: 4 937 citations 5 papers (total)

Papers in Database (1)

attack arXiv Oct 7, 2025 · Oct 2025

LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback

Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė et al. · University of Cagliari · Centre for AI Governance +1 more

White-box LLM jailbreak using latent-space-guided word substitutions to produce low-perplexity prompts that evade perplexity-based safety filters

Prompt Injection nlp
1 citations PDF