ShapePuri: Shape Guided and Appearance Generalized Adversarial Purification
Published on arXiv
2602.05175
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Achieves 81.64% robust accuracy and 84.06% clean accuracy under AutoAttack, the first defense to surpass the 80% robust accuracy threshold on this benchmark.
ShapePuri
Novel technique introduced
Deep neural networks demonstrate impressive performance in visual recognition, but they remain vulnerable to adversarial attacks that is imperceptible to the human. Although existing defense strategies such as adversarial training and purification have achieved progress, diffusion-based purification often involves high computational costs and information loss. To address these challenges, we introduce Shape Guided Purification (ShapePuri), a novel defense framework enhances robustness by aligning model representations with stable structural invariants. ShapePuri integrates two components: a Shape Encoding Module (SEM) that provides dense geometric guidance through Signed Distance Functions (SDF), and a Global Appearance Debiasing (GAD) module that mitigates appearance bias via stochastic transformations. In our experiments, ShapePuri achieves $84.06\%$ clean accuracy and $81.64\%$ robust accuracy under the AutoAttack protocol, representing the first defense framework to surpass the $80\%$ threshold on this benchmark. Our approach provides a scalable and efficient adversarial defense that preserves prediction stability during inference without requiring auxiliary modules or additional computational cost.
Key Contributions
- Shape Encoding Module (SEM) using Signed Distance Functions (SDF) to provide dense geometric guidance that aligns model representations with stable structural invariants
- Global Appearance Debiasing (GAD) module that reduces appearance bias via stochastic transformations
- First defense framework to surpass 80% robust accuracy under the AutoAttack benchmark, achieving 81.64% robust accuracy and 84.06% clean accuracy
🛡️ Threat Analysis
Directly defends against adversarial input manipulation attacks at inference time, evaluated under the AutoAttack protocol — a standard adversarial example benchmark for image classifiers.