Latest papers

3 papers
benchmark arXiv Dec 11, 2025 · Dec 2025

Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks

Kristina Korotkova, Aleksandr Katrutsa · Moscow Institute of Physics and Technology · Skolkovo Institute of Science and Technology

Empirically evaluates Frank-Wolfe projection-free methods vs. PGD/FGSM for constructing white-box adversarial attacks under l1, l2, and l-inf constraints

Input Manipulation Attack vision
PDF
attack Inf. Sciences Nov 2, 2025 · Nov 2025

T-MLA: A targeted multiscale log-exponential attack framework for neural image compression

Nikolay I. Kalmykov, Razan Dibo, Kaiyu Shen et al. · Skolkovo Institute of Science and Technology · Artificial Intelligence Research Institute +1 more

Wavelet-domain adversarial attack on neural image compression causes imperceptible inputs to produce severely degraded reconstructions

Input Manipulation Attack vision
1 citations PDF Code
attack arXiv Sep 26, 2025 · Sep 2025

The Rogue Scalpel: Activation Steering Compromises LLM Safety

Anton Korznikov, Andrey Galichin, Alexey Dontsov et al. · Skolkovo Institute of Science and Technology

Activation steering—even with random or benign SAE vectors—reliably jailbreaks aligned LLMs by corrupting internal hidden states at inference

Prompt Injection nlp
4 citations PDF