attack 2025

SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

Ankit Gupta , Christoph Adami , Emily Dolson

0 citations · 37 references · arXiv

α

Published on arXiv

2512.06185

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

SPOOF achieves near-certain misclassification confidence starting from a blank canvas using fewer than 2% active pixels, with ViT-B/16 requiring substantially fewer queries than ResNet-50 or AlexNet.

SPOOF

Novel technique introduced


Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.


Key Contributions

  • Re-implementation and extension of CPPN- and direct-encoding fooling attacks to modern architectures (AlexNet, ResNet-50, ViT-B/16), confirming persistent vulnerability
  • Introduction of SPOOF, a greedy hill-climbing black-box attack that produces high-confidence fooling images using extremely sparse pixel modifications at drastically reduced compute cost
  • Empirical finding that ViT-B/16 is the most susceptible classifier, fooled with the fewest queries, while retraining with fooling images provides only partial resistance

🛡️ Threat Analysis

Input Manipulation Attack

SPOOF crafts inputs at inference time that cause DNNs to output high-confidence misclassifications — a direct input manipulation attack. The attack is black-box, query-based, and targeted at specific classes, starting from blank canvases and greedily modifying pixels to maximize target-class confidence.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
ImageNet
Applications
image classification