Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

Machine learning backdoors have the property that the machine learning model should work as expected on normal inputs, but when the input contains a specific $\textit{trigger}$, it behaves as the attacker desires. Detecting such triggers has been proven to be extremely difficult. In this paper, we present a novel and explainable approach to detect and eliminate such backdoor triggers based on active paths found in neural networks. We present promising experimental evidence of our approach, which involves injecting backdoors into a machine learning model used for intrusion detection.

Key Contributions

Novel explainable backdoor detection method based on 'active paths' in neural networks, which manifest unusually strong activations under triggered inputs
Automated backdoor elimination technique that leverages the active-path analysis to remove detected backdoors without degrading clean-data performance
Empirical evaluation applying the approach to a neural-network-based network intrusion detection system (IDS)

🛡️ Threat Analysis

Model Poisoning

The paper directly defends against backdoor/trojan attacks in neural networks. It proposes a white-box detection method (using active paths during forward propagation) and an automated elimination technique to remove the embedded backdoor behavior without degrading clean accuracy.

Details

Domains

tabular

Threat Tags

training_timetargeteddigitalwhite_box

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Backdoor Directions in Vision Transformers

CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding

When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI

Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks

NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning

Backdoor Mitigation via Invertible Pruning Masks

Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution