defense 2026

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun 1, Wanwei Liu 1, Haoang Chi 1, Tingyu Chen 1, Xiaoguang Mao 1, Shangwen Wang 1, Lei Bu 2, Jingyi Wang 3, Yang Tan 1, Zhenyi Qi 1

0 citations

α

Published on arXiv

2604.00422

Input Manipulation Attack

OWASP ML Top 10 — ML01

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Outperforms baselines by up to 10.56% in backdoor removal, 5.78% in adversarial mitigation, and 11.82% in unfairness repair while preserving accuracy

SHARPEN

Novel technique introduced


DNNs are susceptible to defects like backdoors, adversarial attacks, and unfairness, undermining their reliability. Existing approaches mainly involve retraining, optimization, constraint-solving, or search algorithms. However, most methods rely on gradient calculations, restricting applicability to specific activation functions (e.g., ReLU), or use search algorithms with uninterpretable localization and repair. Furthermore, they often lack generalizability across multiple properties. We propose SHARPEN, integrating interpretable fault localization with a derivative-free optimization strategy. First, SHARPEN introduces a Deep SHAP-based localization strategy quantifying each layer's and neuron's marginal contribution to erroneous outputs. Specifically, a hierarchical coarse-to-fine approach reranks layers by aggregated impact, then locates faulty neurons/filters by analyzing activation divergences between property-violating and benign states. Subsequently, SHARPEN incorporates CMA-ES to repair identified neurons. CMA-ES leverages a covariance matrix to capture variable dependencies, enabling gradient-free search and coordinated adjustments across coupled neurons. By combining interpretable localization with evolutionary optimization, SHARPEN enables derivative-free repair across architectures, being less sensitive to gradient anomalies and hyperparameters. We demonstrate SHARPEN's effectiveness on three repair tasks. Balancing property repair and accuracy preservation, it outperforms baselines in backdoor removal (+10.56%), adversarial mitigation (+5.78%), and unfairness repair (+11.82%). Notably, SHARPEN handles diverse tasks, and its modular design is plug-and-play with different derivative-free optimizers, highlighting its flexibility.


Key Contributions

  • Deep SHAP-based hierarchical fault localization quantifying layer and neuron contributions to erroneous outputs
  • Derivative-free neural repair using CMA-ES that works across activation functions without gradient calculations
  • Unified framework handling multiple repair tasks: backdoor removal, adversarial mitigation, and unfairness repair

🛡️ Threat Analysis

Input Manipulation Attack

Paper addresses adversarial attack mitigation as one of three repair tasks, using evolutionary optimization to repair neurons vulnerable to adversarial examples.

Model Poisoning

Paper addresses backdoor removal as a primary repair task, localizing and repairing backdoored neurons using SHAP-based fault localization.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxtraining_timeinference_time
Datasets
CIFAR-10
Applications
image classification