SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.

Key Contributions

Re-implementation and extension of CPPN- and direct-encoding fooling attacks to modern architectures (AlexNet, ResNet-50, ViT-B/16), confirming persistent vulnerability
Introduction of SPOOF, a greedy hill-climbing black-box attack that produces high-confidence fooling images using extremely sparse pixel modifications at drastically reduced compute cost
Empirical finding that ViT-B/16 is the most susceptible classifier, fooled with the fewest queries, while retraining with fooling images provides only partial resistance

🛡️ Threat Analysis

Input Manipulation Attack

SPOOF crafts inputs at inference time that cause DNNs to output high-confidence misclassifications — a direct input manipulation attack. The attack is black-box, query-based, and targeted at specific classes, starting from blank canvases and greedily modifying pixels to maximize target-class confidence.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_timetargeteddigital

Datasets

ImageNet

Applications

2025 5 cit.

Input Manipulation Attack

92%