defense 2025

Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks

Ayushi Mehrotra 1, Derek Peng 2, Dipkamal Bhusal 3, Nidhi Rastogi 3

0 citations · 18 references · arXiv

α

Published on arXiv

2510.04245

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves higher robust and clean accuracy than state-of-the-art PatchCleanser across all attack intensities on Imagenette with ResNet-50, without assuming patch size or location

Concept-Based Masking

Novel technique introduced


Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.


Key Contributions

  • Patch-agnostic defense that leverages CRAFT concept activation vectors to identify and suppress spatially localized adversarial patch influence without explicit patch detection
  • Eliminates the need for prior knowledge of patch size, shape, or location — a key limitation of prior defenses like PatchCleanser
  • Outperforms PatchCleanser in both robust accuracy and clean accuracy on Imagenette with ResNet-50 across varying patch sizes and locations

🛡️ Threat Analysis

Input Manipulation Attack

Adversarial patch attacks are localized inference-time perturbations that force misclassification; the paper's primary contribution is a patch-agnostic defense against this threat using concept-based explanation masking.


Details

Domains
vision
Model Types
cnn
Threat Tags
black_boxinference_timetargeteddigitalphysical
Datasets
Imagenette
Applications
image classificationautonomous vehiclesfacial recognition