benchmark 2026

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

Xiaosen Wang 1,2, Zhijin Ge 1, Bohan Liu 1, Zheng Fang 3, Fengfan Zhou 1,2, Ruixuan Zhang 4, Shaokang Wang 5, Yuyang Luo 1,2

0 citations

α

Published on arXiv

2602.23117

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Proposes the most comprehensive overview and standardized benchmark of transfer-based adversarial attacks to date, identifying cases where prior methods fail to outperform baselines under fair evaluation conditions.

TransferAttack

Novel technique introduced


Adversarial transferability refers to the capacity of adversarial examples generated on the surrogate model to deceive alternate, unexposed victim models. This property eliminates the need for direct access to the victim model during an attack, thereby raising considerable security concerns in practical applications and attracting substantial research attention recently. In this work, we discern a lack of a standardized framework and criteria for evaluating transfer-based attacks, leading to potentially biased assessments of existing approaches. To rectify this gap, we have conducted an exhaustive review of hundreds of related works, organizing various transfer-based attacks into six distinct categories. Subsequently, we propose a comprehensive framework designed to serve as a benchmark for evaluating these attacks. In addition, we delineate common strategies that enhance adversarial transferability and highlight prevalent issues that could lead to unfair comparisons. Finally, we provide a brief review of transfer-based attacks beyond image classification.


Key Contributions

  • Systematic categorization of 100+ transfer-based adversarial attacks into six classes: gradient-based, input transformation, advanced objective function, generation-based, model-related, and ensemble-based
  • Unified evaluation framework (TransferAttack) for standardized and fair comparison of transfer-based attacks across both untargeted and targeted settings
  • Identification of prevalent methodological issues causing unfair comparisons in existing transfer-based attack literature

🛡️ Threat Analysis

Input Manipulation Attack

The entire paper focuses on transfer-based adversarial examples — crafting perturbations on surrogate models that transfer to fool black-box victim models at inference time. All six attack categories reviewed (gradient-based, input transformation, advanced objectives, generation-based, model-related, ensemble-based) are subtypes of adversarial input manipulation attacks.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxwhite_boxinference_timetargeteduntargeteddigital
Datasets
ImageNet
Applications
image classificationface recognitionautonomous driving