Adversarial Evasion Attacks on Computer Vision using SHAP Values

The paper introduces a white-box attack on computer vision models using SHAP values. It demonstrates how adversarial evasion attacks can compromise the performance of deep learning models by reducing output confidence or inducing misclassifications. Such attacks are particularly insidious as they can deceive the perception of an algorithm while eluding human perception due to their imperceptibility to the human eye. The proposed attack leverages SHAP values to quantify the significance of individual inputs to the output at the inference stage. A comparison is drawn between the SHAP attack and the well-known Fast Gradient Sign Method. We find evidence that SHAP attacks are more robust in generating misclassifications particularly in gradient hiding scenarios.

Key Contributions

Introduces a white-box adversarial evasion attack using SHAP values to identify and perturb the most influential input pixels/features
Demonstrates that SHAP-based attacks are more effective than FGSM in gradient hiding scenarios
Provides comparative empirical evaluation of SHAP attacks versus FGSM on deep learning computer vision models

🛡️ Threat Analysis

Input Manipulation Attack

Directly proposes a white-box adversarial evasion attack that crafts imperceptible perturbations to cause misclassification at inference time, using SHAP values to identify and manipulate the most influential input features — a classic adversarial example attack.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxinference_timeuntargeteddigital

Applications

2026 0 cit.

Input Manipulation Attack

92%