OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Despite the tremendous success of neural networks, benign images can be corrupted by adversarial perturbations to deceive these models. Intriguingly, images differ in their attackability. Specifically, given an attack configuration, some images are easily corrupted, whereas others are more resistant. Evaluating image attackability has important applications in active learning, adversarial training, and attack enhancement. This prompts a growing interest in developing attackability measures. However, existing methods are scarce and suffer from two major limitations: (1) They rely on a model proxy to provide prior knowledge (e.g., gradients or minimal perturbation) to extract model-dependent image features. Unfortunately, in practice, many task-specific models are not readily accessible. (2) Extracted features characterizing image attackability lack visual interpretability, obscuring their direct relationship with the images. To address these, we propose a novel Object Texture Intensity (OTI), a model-free and visually interpretable measure of image attackability, which measures image attackability as the texture intensity of the image's semantic object. Theoretically, we describe the principles of OTI from the perspectives of decision boundaries as well as the mid- and high-frequency characteristics of adversarial perturbations. Comprehensive experiments demonstrate that OTI is effective and computationally efficient. In addition, our OTI provides the adversarial machine learning community with a visual understanding of attackability.

Key Contributions

First work to identify a relationship between semantic object texture intensity and image attackability, providing a model-free and visually interpretable metric
Theoretical justification of OTI from both decision boundary and mid/high-frequency adversarial perturbation perspectives
Comprehensive empirical validation of OTI across diverse attacks, domains, tasks, and defense/non-defense scenarios

🛡️ Threat Analysis

Input Manipulation Attack

Directly addresses adversarial examples — OTI measures how susceptible each image is to adversarial perturbation attacks at inference time, with explicit applications to adversarial training (defense) and attack enhancement (offense). The entire paper is grounded in understanding why some images are easier to corrupt via adversarial perturbations.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

inference_timewhite_boxblack_boxdigital

Datasets

ImageNet

Applications

2025 0 cit.

Input Manipulation Attack

86%