Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks

Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.

Key Contributions

Patch-agnostic defense that leverages CRAFT concept activation vectors to identify and suppress spatially localized adversarial patch influence without explicit patch detection
Eliminates the need for prior knowledge of patch size, shape, or location — a key limitation of prior defenses like PatchCleanser
Outperforms PatchCleanser in both robust accuracy and clean accuracy on Imagenette with ResNet-50 across varying patch sizes and locations

🛡️ Threat Analysis

Input Manipulation Attack

Adversarial patch attacks are localized inference-time perturbations that force misclassification; the paper's primary contribution is a patch-agnostic defense against this threat using concept-based explanation masking.