Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks
Ayushi Mehrotra 1, Derek Peng 2, Dipkamal Bhusal 3, Nidhi Rastogi 3
Published on arXiv
2510.04245
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves higher robust and clean accuracy than state-of-the-art PatchCleanser across all attack intensities on Imagenette with ResNet-50, without assuming patch size or location
Concept-Based Masking
Novel technique introduced
Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.
Key Contributions
- Patch-agnostic defense that leverages CRAFT concept activation vectors to identify and suppress spatially localized adversarial patch influence without explicit patch detection
- Eliminates the need for prior knowledge of patch size, shape, or location — a key limitation of prior defenses like PatchCleanser
- Outperforms PatchCleanser in both robust accuracy and clean accuracy on Imagenette with ResNet-50 across varying patch sizes and locations
🛡️ Threat Analysis
Adversarial patch attacks are localized inference-time perturbations that force misclassification; the paper's primary contribution is a patch-agnostic defense against this threat using concept-based explanation masking.