Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement

It is known that deep neural networks may exhibit dangerous behaviors under various security threats (e.g., backdoor attacks, adversarial attacks and safety property violation) and there exists an ongoing arms race between attackers and defenders. In this work, we propose a complementary perspective to utilize recent progress on "neural network repair" to mitigate these security threats and repair various kinds of neural network defects (arising from different security threats) within a unified framework, offering a potential silver bullet solution to real-world scenarios. To substantially push the boundary of existing repair techniques (suffering from limitations such as lack of guarantees, limited scalability, considerable overhead, etc) in addressing more practical contexts, we propose ProRepair, a novel provable neural network repair framework driven by formal preimage synthesis and property refinement. The key intuitions are: (i) synthesizing a precise proxy box to characterize the feature space preimage, which can derive a bounded distance term sufficient to guide the subsequent repair step towards the correct outputs, and (ii) performing property refinement to enable surgical corrections and scale to more complex tasks. We evaluate ProRepair across four security threats repair tasks on six benchmarks and the results demonstrate it outperforms existing methods in effectiveness, efficiency and scalability. For point-wise repair, ProRepair corrects models while preserving performance and achieving significantly improved generalization, with a speedup of 5x to 2000x over existing provable approaches. In region-wise repair, ProRepair successfully repairs all 36 safety property violation instances (compared to 8 by the best existing method), and can handle 18x higher dimensional spaces.

Key Contributions

ProRepair: a provable neural network repair framework using formal preimage synthesis to characterize feature-space preimages as proxy boxes, deriving bounded correction terms
Property refinement mechanism enabling surgical, scalable corrections for both point-wise (individual defect) and region-wise (safety property) repair tasks
Unified evaluation across four security threat repair tasks on six benchmarks, achieving 5x–2000x speedup over existing provable repair methods and repairing all 36 safety property violation instances vs. 8 by the best prior method

🛡️ Threat Analysis

Input Manipulation Attack

The framework also repairs adversarial vulnerabilities and safety property violations (region-wise repair), directly defending against input manipulation threats at inference time.

Model Poisoning

The framework explicitly repairs models compromised by backdoor attacks, removing malicious behavior while preserving clean accuracy — a direct defense against ML10 threats.