ML Security Papers

Latest papers

2 papers

defense arXiv Apr 28, 2026 · 23d ago

Cross-Lingual Jailbreak Detection via Semantic Codebooks

Shirin Alanova, Bogdan Minko, Sabrina Sadiekh et al. · ITMO University · HiveTraceLab

Training-free jailbreak detector using multilingual embeddings matched against English codebook, effective on templates but degrades on diverse attacks

Prompt Injection nlpmultimodal

PDF

attack arXiv Oct 15, 2025 · Oct 2025

Selective Adversarial Attacks on LLM Benchmarks

Ivan Dubrovsky, Anastasia Orlova, Illarion Iov et al. · ITMO University · Applied AI Institute

Selective word-level adversarial attacks on MMLU questions that degrade one target LLM's benchmark score while leaving competing models unaffected

Prompt Injection nlp

PDF

Latest papers

Cross-Lingual Jailbreak Detection via Semantic Codebooks

Selective Adversarial Attacks on LLM Benchmarks

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue