Latest papers

2 papers
attack arXiv Aug 22, 2025 · Aug 2025

HAMSA: Hijacking Aligned Compact Models via Stealthy Automation

Alexey Krylov, Iskander Vagizov, Dmitrii Korzh et al. · MIPT · Sberbank +4 more

Evolutionary search framework generates fluent, perplexity-evading jailbreak prompts against safety-aligned compact LLMs in English and Arabic

Prompt Injection nlp
PDF
defense arXiv Aug 7, 2025 · Aug 2025

FS-IQA: Certified Feature Smoothing for Robust Image Quality Assessment

Ekaterina Shumitskaya, Dmitriy Vatolin, Anastasia Antsiferova · ISP RAS · MSU AI Institute +2 more

Certified defense for image quality assessment models using feature-space randomized smoothing, providing robustness guarantees with 99.5% faster inference

Input Manipulation Attack vision
PDF