Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
Xiaosen Wang 1,2, Zhijin Ge 1, Bohan Liu 1, Zheng Fang 3, Fengfan Zhou 1,2, Ruixuan Zhang 4, Shaokang Wang 5, Yuyang Luo 1,2
Published on arXiv
2602.23117
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Proposes the most comprehensive overview and standardized benchmark of transfer-based adversarial attacks to date, identifying cases where prior methods fail to outperform baselines under fair evaluation conditions.
TransferAttack
Novel technique introduced
Adversarial transferability refers to the capacity of adversarial examples generated on the surrogate model to deceive alternate, unexposed victim models. This property eliminates the need for direct access to the victim model during an attack, thereby raising considerable security concerns in practical applications and attracting substantial research attention recently. In this work, we discern a lack of a standardized framework and criteria for evaluating transfer-based attacks, leading to potentially biased assessments of existing approaches. To rectify this gap, we have conducted an exhaustive review of hundreds of related works, organizing various transfer-based attacks into six distinct categories. Subsequently, we propose a comprehensive framework designed to serve as a benchmark for evaluating these attacks. In addition, we delineate common strategies that enhance adversarial transferability and highlight prevalent issues that could lead to unfair comparisons. Finally, we provide a brief review of transfer-based attacks beyond image classification.
Key Contributions
- Systematic categorization of 100+ transfer-based adversarial attacks into six classes: gradient-based, input transformation, advanced objective function, generation-based, model-related, and ensemble-based
- Unified evaluation framework (TransferAttack) for standardized and fair comparison of transfer-based attacks across both untargeted and targeted settings
- Identification of prevalent methodological issues causing unfair comparisons in existing transfer-based attack literature
🛡️ Threat Analysis
The entire paper focuses on transfer-based adversarial examples — crafting perturbations on surrogate models that transfer to fool black-box victim models at inference time. All six attack categories reviewed (gradient-based, input transformation, advanced objectives, generation-based, model-related, ensemble-based) are subtypes of adversarial input manipulation attacks.