ML Security Papers

Latest papers

3 papers

benchmark arXiv Dec 11, 2025 · Dec 2025

Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks

Kristina Korotkova, Aleksandr Katrutsa · Moscow Institute of Physics and Technology · Skolkovo Institute of Science and Technology

Empirically evaluates Frank-Wolfe projection-free methods vs. PGD/FGSM for constructing white-box adversarial attacks under l1, l2, and l-inf constraints

Input Manipulation Attack vision

PDF

attack Inf. Sciences Nov 2, 2025 · Nov 2025

T-MLA: A targeted multiscale log-exponential attack framework for neural image compression

Nikolay I. Kalmykov, Razan Dibo, Kaiyu Shen et al. · Skolkovo Institute of Science and Technology · Artificial Intelligence Research Institute +1 more

Wavelet-domain adversarial attack on neural image compression causes imperceptible inputs to produce severely degraded reconstructions

Input Manipulation Attack vision

1 citations PDF Code

attack arXiv Sep 26, 2025 · Sep 2025

The Rogue Scalpel: Activation Steering Compromises LLM Safety

Anton Korznikov, Andrey Galichin, Alexey Dontsov et al. · Skolkovo Institute of Science and Technology

Activation steering—even with random or benign SAE vectors—reliably jailbreaks aligned LLMs by corrupting internal hidden states at inference

Prompt Injection nlp

4 citations PDF

Latest papers

Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks

T-MLA: A targeted multiscale log-exponential attack framework for neural image compression

The Rogue Scalpel: Activation Steering Compromises LLM Safety

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue