defense 2026

ShapePuri: Shape Guided and Appearance Generalized Adversarial Purification

Zhe Li , Bernhard Kainz

0 citations · 21 references · arXiv (Cornell University)

α

Published on arXiv

2602.05175

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves 81.64% robust accuracy and 84.06% clean accuracy under AutoAttack, the first defense to surpass the 80% robust accuracy threshold on this benchmark.

ShapePuri

Novel technique introduced


Deep neural networks demonstrate impressive performance in visual recognition, but they remain vulnerable to adversarial attacks that is imperceptible to the human. Although existing defense strategies such as adversarial training and purification have achieved progress, diffusion-based purification often involves high computational costs and information loss. To address these challenges, we introduce Shape Guided Purification (ShapePuri), a novel defense framework enhances robustness by aligning model representations with stable structural invariants. ShapePuri integrates two components: a Shape Encoding Module (SEM) that provides dense geometric guidance through Signed Distance Functions (SDF), and a Global Appearance Debiasing (GAD) module that mitigates appearance bias via stochastic transformations. In our experiments, ShapePuri achieves $84.06\%$ clean accuracy and $81.64\%$ robust accuracy under the AutoAttack protocol, representing the first defense framework to surpass the $80\%$ threshold on this benchmark. Our approach provides a scalable and efficient adversarial defense that preserves prediction stability during inference without requiring auxiliary modules or additional computational cost.


Key Contributions

  • Shape Encoding Module (SEM) using Signed Distance Functions (SDF) to provide dense geometric guidance that aligns model representations with stable structural invariants
  • Global Appearance Debiasing (GAD) module that reduces appearance bias via stochastic transformations
  • First defense framework to surpass 80% robust accuracy under the AutoAttack benchmark, achieving 81.64% robust accuracy and 84.06% clean accuracy

🛡️ Threat Analysis

Input Manipulation Attack

Directly defends against adversarial input manipulation attacks at inference time, evaluated under the AutoAttack protocol — a standard adversarial example benchmark for image classifiers.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
white_boxinference_timedigital
Datasets
AutoAttack benchmark
Applications
image classification