defense 2026

Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

Eirik Høyheim 1,2, Magnus Wiik Eckhoff 1,2, Gudmund Grov 1,2, Robert Flood 3,2, David Aspinall 3

0 citations

α

Published on arXiv

2603.10641

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

The active-path-based approach successfully detects and eliminates injected backdoor triggers in a neural network IDS model without degrading normal-traffic classification performance.

Active Paths Backdoor Detection

Novel technique introduced


Machine learning backdoors have the property that the machine learning model should work as expected on normal inputs, but when the input contains a specific $\textit{trigger}$, it behaves as the attacker desires. Detecting such triggers has been proven to be extremely difficult. In this paper, we present a novel and explainable approach to detect and eliminate such backdoor triggers based on active paths found in neural networks. We present promising experimental evidence of our approach, which involves injecting backdoors into a machine learning model used for intrusion detection.


Key Contributions

  • Novel explainable backdoor detection method based on 'active paths' in neural networks, which manifest unusually strong activations under triggered inputs
  • Automated backdoor elimination technique that leverages the active-path analysis to remove detected backdoors without degrading clean-data performance
  • Empirical evaluation applying the approach to a neural-network-based network intrusion detection system (IDS)

🛡️ Threat Analysis

Model Poisoning

The paper directly defends against backdoor/trojan attacks in neural networks. It proposes a white-box detection method (using active paths during forward propagation) and an automated elimination technique to remove the embedded backdoor behavior without degrading clean accuracy.


Details

Domains
tabular
Threat Tags
training_timetargeteddigitalwhite_box
Applications
network intrusion detectiontabular classification