Latest papers

4 papers
defense arXiv Mar 3, 2026 · 4w ago

Conditioned Activation Transport for T2I Safety Steering

Maciej Chrabąszcz, Aleksander Szymczyk, Jan Dubiński et al. · NASK National Research Institute · Warsaw University of Technology +3 more

Proposes conditioned activation transport to steer T2I model activations away from unsafe regions while preserving image quality

Prompt Injection visionmultimodalgenerative
PDF Code
attack arXiv Dec 10, 2025 · Dec 2025

Membership and Dataset Inference Attacks on Large Audio Generative Models

Jakub Proboszcz, Paweł Kochanski, Karol Korszun et al. · Warsaw University of Technology · Sapienza University of Rome +2 more

Extends dataset inference attacks to audio generative models, showing DI succeeds at copyright verification where single-sample MIA fails

Membership Inference Attack audiogenerative
PDF
attack arXiv Nov 10, 2025 · Nov 2025

On Stealing Graph Neural Network Models

Marcin Podhajski, Jan Dubiński, Franziska Boenisch et al. · Polish Academy of Sciences · IDEAS NCBR +5 more

Steals GNN models with as few as 100 queries by decoupling query-free backbone extraction from strategic head extraction

Model Theft graph
PDF Code
defense arXiv Oct 9, 2025 · Oct 2025

Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

Stanisław Pawlak, Jan Dubiński, Daniel Marczak et al. · Warsaw University of Technology · NASK National Research Institute +3 more

Proposes Backdoor Vectors to unify backdoor attacks in model merging, plus stronger SBV attack and assumption-free IBVS defense

Model Poisoning visionmultimodal
PDF