attack 2026

Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces

Eitan Shaar 1, Ariel Shaulov 2, Yalcin Tur 3, Gal Chechik 4,5, Ravid Shwartz-Ziv 6

0 citations

α

Published on arXiv

2603.06311

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

LTA achieves strong CNN-to-ViT transfer attack success and improved robustness to purification-based defenses, with frequency analysis confirming perturbations concentrate in low-frequency bands unlike pixel-space baselines.

LTA (Latent Transfer Attack)

Novel technique introduced


Adversarial attacks are a central tool for probing the robustness of modern vision models, yet most methods optimize perturbations directly in pixel space under $\ell_\infty$ or $\ell_2$ constraints. While effective in white-box settings, pixel-space optimization often produces high-frequency, texture-like noise that is brittle to common preprocessing (e.g., resizing and cropping) and transfers poorly across architectures. We propose $\textbf{LTA}$ ($\textbf{L}$atent $\textbf{T}$ransfer $\textbf{A}$ttack), a transfer-based attack that instead optimizes perturbations in the latent space of a pretrained Stable Diffusion VAE. Given a clean image, we encode it into a latent code and optimize the latent representation to maximize a surrogate classifier loss, while softly enforcing a pixel-space $\ell_\infty$ budget after decoding. To improve robustness to resolution mismatch and standard input pipelines, we incorporate Expectation Over Transformations (EOT) via randomized resizing, interpolation, and cropping, and apply periodic latent Gaussian smoothing to suppress emerging artifacts and stabilize optimization. Across a suite of CNN and vision-transformer targets, LTA achieves strong transfer attack success while producing spatially coherent, predominantly low-frequency perturbations that differ qualitatively from pixel-space baselines and occupy a distinct point in the transfer-quality trade-off. Our results highlight pretrained generative latent spaces as an effective and structured domain for adversarial optimization, bridging robustness evaluation with modern generative priors.


Key Contributions

  • LTA: a transfer-based adversarial attack that optimizes perturbations in the latent space of a pretrained Stable Diffusion VAE, producing spatially coherent low-frequency perturbations that transfer better across architectures than pixel-space attacks
  • EOT-based robustness via randomized resizing, interpolation, and cropping during latent optimization to survive standard input preprocessing pipelines
  • Periodic latent Gaussian smoothing to suppress high-frequency artifacts and stabilize optimization trajectory in latent space

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is a gradient-based adversarial attack that generates evasion examples at inference time by optimizing in the latent space of a pretrained VAE to maximize surrogate classifier loss, targeting CNN and ViT classifiers in a black-box transfer setting.


Details

Domains
vision
Model Types
cnntransformerdiffusion
Threat Tags
white_boxblack_boxinference_timeuntargeteddigital
Datasets
ImageNet
Applications
image classification