Backdoor Mitigation via Invertible Pruning Masks
Kealan Dunnett 1, Reza Arablouei 2, Dimity Miller 1, Volkan Dedeoglu 2, Raja Jurdak 1
Published on arXiv
2509.15497
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Outperforms existing pruning-based backdoor defenses and achieves competitive results against state-of-the-art fine-tuning approaches, with stronger performance in low-data regimes and improved restoration of correct predictions for triggered inputs.
Invertible Pruning Mask (IPM)
Novel technique introduced
Model pruning has gained traction as a promising defense strategy against backdoor attacks in deep learning. However, existing pruning-based approaches often fall short in accurately identifying and removing the specific parameters responsible for inducing backdoor behaviors. Despite the dominance of fine-tuning-based defenses in recent literature, largely due to their superior performance, pruning remains a compelling alternative, offering greater interpretability and improved robustness in low-data regimes. In this paper, we propose a novel pruning approach featuring a learned \emph{selection} mechanism to identify parameters critical to both main and backdoor tasks, along with an \emph{invertible} pruning mask designed to simultaneously achieve two complementary goals: eliminating the backdoor task while preserving it through the inverse mask. We formulate this as a bi-level optimization problem that jointly learns selection variables, a sparse invertible mask, and sample-specific backdoor perturbations derived from clean data. The inner problem synthesizes candidate triggers using the inverse mask, while the outer problem refines the mask to suppress backdoor behavior without impairing clean-task accuracy. Extensive experiments demonstrate that our approach outperforms existing pruning-based backdoor mitigation approaches, maintains strong performance under limited data conditions, and achieves competitive results compared to state-of-the-art fine-tuning approaches. Notably, the proposed approach is particularly effective in restoring correct predictions for compromised samples after successful backdoor mitigation.
Key Contributions
- Invertible pruning mask that simultaneously suppresses backdoor behavior (forward mask) and validates trigger synthesis (inverse mask), providing a dual-view dissection of model components
- Learned selection mechanism that identifies neurons/filters critical to both clean and backdoor tasks, restricting pruning to eligible components and avoiding over-pruning
- Bi-level optimization framework that jointly learns selection variables, the sparse invertible mask, and sample-specific backdoor perturbations synthesized from clean data — without requiring access to poisoned training data
🛡️ Threat Analysis
The paper's entire contribution is a defense against backdoor/trojan attacks: it proposes an invertible pruning mask and bi-level optimization framework to identify and remove model parameters responsible for trigger-activated misclassification, directly mitigating backdoor behavior embedded in deep neural networks.