attack 2025

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Melane Navaratnarajah , David A. Kelly , Hana Chockler

1 citations · 44 references · arXiv

α

Published on arXiv

2512.03730

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

On COCO, BlackCAtt outperforms baseline black-box attacks by 2.7× (detection removal), 3.86× (detection modification), and 5.75× (spurious detection injection) while remaining imperceptible (L2 < 4/255).

BlackCAtt

Novel technique introduced


Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. BlackCAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of different sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box attack methods and show that identification of causal pixels leads to more precisely targeted and less perceptible attacks. On the COCO test dataset, our approach is 2.7 times better than the baseline in removing a detection, 3.86 times better in changing a detection, and 5.75 times better in triggering new, spurious, detections. The attacks generated by BlackCAtt are very close to the original image, and hence imperceptible, demonstrating the power of causal pixels.


Key Contributions

  • BlackCAtt: a black-box, architecture-agnostic adversarial attack that identifies causally sufficient pixel sets to construct explainable, imperceptible perturbations for object detectors
  • Causal pixel targeting enables attacks that are 2.7× better at suppressing detections, 3.86× better at modifying detections, and 5.75× better at injecting spurious detections versus baselines on COCO
  • Provides interpretability into why adversarial attacks succeed by grounding perturbations in causal pixel explanations rather than gradient signal

🛡️ Threat Analysis

Input Manipulation Attack

BlackCAtt crafts adversarial input perturbations at inference time to manipulate object detector outputs (bounding box removal, modification, or spurious addition) — a canonical input manipulation / evasion attack. It is black-box, architecture-agnostic, and evaluated against baseline black-box attack methods.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
COCO
Applications
object detection