benchmark 2026

Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Alexis Winter , Jean-Vincent Martini , Romaric Audigier , Angelique Loesch , Bertrand Luvison

0 citations · arXiv (Cornell University)

α

Published on arXiv

2602.16494

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Adversarial attacks show significant lack of transferability to transformer-based detectors, and mixing spatially- and semantically-targeted high-perturbation attacks yields the most robust adversarial training strategy


Object detection models are critical components of automated systems, such as autonomous vehicles and perception-based robots, but their sensitivity to adversarial attacks poses a serious security risk. Progress in defending these models lags behind classification, hindered by a lack of standardized evaluation. It is nearly impossible to thoroughly compare attack or defense methods, as existing work uses different datasets, inconsistent efficiency metrics, and varied measures of perturbation cost. This paper addresses this gap by investigating three key questions: (1) How can we create a fair benchmark to impartially compare attacks? (2) How well do modern attacks transfer across different architectures, especially from Convolutional Neural Networks to Vision Transformers? (3) What is the most effective adversarial training strategy for robust defense? To answer these, we first propose a unified benchmark framework focused on digital, non-patch-based attacks. This framework introduces specific metrics to disentangle localization and classification errors and evaluates attack cost using multiple perceptual metrics. Using this benchmark, we conduct extensive experiments on state-of-the-art attacks and a wide range of detectors. Our findings reveal two major conclusions: first, modern adversarial attacks against object detection models show a significant lack of transferability to transformer-based architectures. Second, we demonstrate that the most robust adversarial training strategy leverages a dataset composed of a mix of high-perturbation attacks with different objectives (e.g., spatial and semantic), which outperforms training on any single attack.


Key Contributions

  • Unified benchmark framework for digital non-patch-based adversarial attacks on object detectors, introducing AP_loc and CSR metrics to disentangle localization vs. classification failures
  • Cross-architectural transferability analysis revealing that modern adversarial attacks fail to transfer from CNN-based to transformer-based detectors (e.g., DINO)
  • Empirical finding that adversarial training on a mix of high-perturbation attacks with complementary objectives (spatial + semantic) outperforms training on any single attack type

🛡️ Threat Analysis

Input Manipulation Attack

Paper benchmarks digital non-patch-based adversarial attacks at inference time on object detectors and evaluates adversarial training as a defense — the entire paper is organized around input manipulation attacks and robustness to them.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxgrey_boxdigitalinference_time
Datasets
COCO
Applications
object detectionautonomous drivingvideo surveillanceperception-based robotics