TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

An image encoder pre-trained by self-supervised learning can be used as a general-purpose feature extractor to build downstream classifiers for various downstream tasks. However, many studies showed that an attacker can embed a trojan into an encoder such that multiple downstream classifiers built based on the trojaned encoder simultaneously inherit the trojan behavior. In this work, we propose TrojanDec, the first data-free method to identify and recover a test input embedded with a trigger. Given a (trojaned or clean) encoder and a test input, TrojanDec first predicts whether the test input is trojaned. If not, the test input is processed in a normal way to maintain the utility. Otherwise, the test input will be further restored to remove the trigger. Our extensive evaluation shows that TrojanDec can effectively identify the trojan (if any) from a given test input and recover it under state-of-the-art trojan attacks. We further demonstrate by experiments that our TrojanDec outperforms the state-of-the-art defenses.

Key Contributions

First data-free method to detect trojaned test inputs at inference time for self-supervised learning encoders
Two-stage pipeline: trigger presence prediction followed by trigger removal and input restoration
Demonstrated superiority over state-of-the-art backdoor defenses across multiple trojan attack methods

🛡️ Threat Analysis

Model Poisoning

Directly defends against backdoor/trojan attacks: the paper proposes TrojanDec to detect whether a test input contains an embedded trigger (inherited from a trojaned SSL encoder) and to restore the clean version by removing the trigger. This is a classic ML10 inference-time backdoor defense.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

training_timeinference_timetargeteddigital

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks

TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI

Backdoor Mitigation via Invertible Pruning Masks

NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning

Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks

BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder