ShapePuri: Shape Guided and Appearance Generalized Adversarial Purification

Deep neural networks demonstrate impressive performance in visual recognition, but they remain vulnerable to adversarial attacks that is imperceptible to the human. Although existing defense strategies such as adversarial training and purification have achieved progress, diffusion-based purification often involves high computational costs and information loss. To address these challenges, we introduce Shape Guided Purification (ShapePuri), a novel defense framework enhances robustness by aligning model representations with stable structural invariants. ShapePuri integrates two components: a Shape Encoding Module (SEM) that provides dense geometric guidance through Signed Distance Functions (SDF), and a Global Appearance Debiasing (GAD) module that mitigates appearance bias via stochastic transformations. In our experiments, ShapePuri achieves $84.06\%$ clean accuracy and $81.64\%$ robust accuracy under the AutoAttack protocol, representing the first defense framework to surpass the $80\%$ threshold on this benchmark. Our approach provides a scalable and efficient adversarial defense that preserves prediction stability during inference without requiring auxiliary modules or additional computational cost.

Key Contributions

Shape Encoding Module (SEM) using Signed Distance Functions (SDF) to provide dense geometric guidance that aligns model representations with stable structural invariants
Global Appearance Debiasing (GAD) module that reduces appearance bias via stochastic transformations
First defense framework to surpass 80% robust accuracy under the AutoAttack benchmark, achieving 81.64% robust accuracy and 84.06% clean accuracy

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial input manipulation attacks at inference time, evaluated under the AutoAttack protocol — a standard adversarial example benchmark for image classifiers.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxinference_timedigital

Datasets

AutoAttack benchmark

Applications

2025 0 cit.

Input Manipulation Attack

100%