Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection
Eirik Høyheim 1,2, Magnus Wiik Eckhoff 1,2, Gudmund Grov 1,2, Robert Flood 3,2, David Aspinall 3
Published on arXiv
2603.10641
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
The active-path-based approach successfully detects and eliminates injected backdoor triggers in a neural network IDS model without degrading normal-traffic classification performance.
Active Paths Backdoor Detection
Novel technique introduced
Machine learning backdoors have the property that the machine learning model should work as expected on normal inputs, but when the input contains a specific $\textit{trigger}$, it behaves as the attacker desires. Detecting such triggers has been proven to be extremely difficult. In this paper, we present a novel and explainable approach to detect and eliminate such backdoor triggers based on active paths found in neural networks. We present promising experimental evidence of our approach, which involves injecting backdoors into a machine learning model used for intrusion detection.
Key Contributions
- Novel explainable backdoor detection method based on 'active paths' in neural networks, which manifest unusually strong activations under triggered inputs
- Automated backdoor elimination technique that leverages the active-path analysis to remove detected backdoors without degrading clean-data performance
- Empirical evaluation applying the approach to a neural-network-based network intrusion detection system (IDS)
🛡️ Threat Analysis
The paper directly defends against backdoor/trojan attacks in neural networks. It proposes a white-box detection method (using active paths during forward propagation) and an automated elimination technique to remove the embedded backdoor behavior without degrading clean accuracy.