Latest papers

2 papers
defense arXiv Apr 28, 2026 · 23d ago

Cross-Lingual Jailbreak Detection via Semantic Codebooks

Shirin Alanova, Bogdan Minko, Sabrina Sadiekh et al. · ITMO University · HiveTraceLab

Training-free jailbreak detector using multilingual embeddings matched against English codebook, effective on templates but degrades on diverse attacks

Prompt Injection nlpmultimodal
PDF
attack arXiv Oct 15, 2025 · Oct 2025

Selective Adversarial Attacks on LLM Benchmarks

Ivan Dubrovsky, Anastasia Orlova, Illarion Iov et al. · ITMO University · Applied AI Institute

Selective word-level adversarial attacks on MMLU questions that degrade one target LLM's benchmark score while leaving competing models unaffected

Prompt Injection nlp
PDF