Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Object detection models are critical components of automated systems, such as autonomous vehicles and perception-based robots, but their sensitivity to adversarial attacks poses a serious security risk. Progress in defending these models lags behind classification, hindered by a lack of standardized evaluation. It is nearly impossible to thoroughly compare attack or defense methods, as existing work uses different datasets, inconsistent efficiency metrics, and varied measures of perturbation cost. This paper addresses this gap by investigating three key questions: (1) How can we create a fair benchmark to impartially compare attacks? (2) How well do modern attacks transfer across different architectures, especially from Convolutional Neural Networks to Vision Transformers? (3) What is the most effective adversarial training strategy for robust defense? To answer these, we first propose a unified benchmark framework focused on digital, non-patch-based attacks. This framework introduces specific metrics to disentangle localization and classification errors and evaluates attack cost using multiple perceptual metrics. Using this benchmark, we conduct extensive experiments on state-of-the-art attacks and a wide range of detectors. Our findings reveal two major conclusions: first, modern adversarial attacks against object detection models show a significant lack of transferability to transformer-based architectures. Second, we demonstrate that the most robust adversarial training strategy leverages a dataset composed of a mix of high-perturbation attacks with different objectives (e.g., spatial and semantic), which outperforms training on any single attack.

Key Contributions

Unified benchmark framework for digital non-patch-based adversarial attacks on object detectors, introducing AP_loc and CSR metrics to disentangle localization vs. classification failures
Cross-architectural transferability analysis revealing that modern adversarial attacks fail to transfer from CNN-based to transformer-based detectors (e.g., DINO)
Empirical finding that adversarial training on a mix of high-perturbation attacks with complementary objectives (spatial + semantic) outperforms training on any single attack type

🛡️ Threat Analysis

Input Manipulation Attack

Paper benchmarks digital non-patch-based adversarial attacks at inference time on object detectors and evaluates adversarial training as a defense — the entire paper is organized around input manipulation attacks and robustness to them.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxgrey_boxdigitalinference_time

Datasets

COCO

Applications

2026 0 cit.

Input Manipulation Attack

83%

Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Adversarial Attacks Leverage Interference Between Features in Superposition

When Flatness Does (Not) Guarantee Adversarial Robustness

How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Formal Reasoning About Confidence and Automated Verification of Neural Networks

Exploring Sparsity and Smoothness of Arbitrary $\ell_p$ Norms in Adversarial Attacks

Solving adversarial examples requires solving exponential misalignment