Hardware-Triggered Backdoors | ML Security Papers

Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical results, differences in its design can lead to small numerical variations during inference. In this work, we show that these variations can be exploited to create backdoors in machine learning models. The core idea is to shape the model's decision function such that it yields different predictions for the same input when executed on different hardware. This effect is achieved by locally moving the decision boundary close to a target input and then refining numerical deviations to flip the prediction on selected hardware. We empirically demonstrate that these hardware-triggered backdoors can be created reliably across common GPU accelerators. Our findings reveal a novel attack vector affecting the use of third-party models, and we investigate different defenses to counter this threat.

Key Contributions

Novel hardware-triggered backdoor attack that exploits floating-point numerical deviations across GPU accelerators to produce hardware-dependent predictions for the same input
Methodology to locally move decision boundaries near target inputs and refine numerical deviations (via bit flips and permutation-based modifications) to reliably activate backdoors on selected hardware
Empirical evaluation across H100, A100, A40, and RTX6000 GPUs showing >99% attack success rate across ViT, ResNet, and EfficientNet in float16 and bfloat16, plus investigation of defenses (input perturbation, batch size variation)

🛡️ Threat Analysis

Model Poisoning

Primary contribution is a backdoor injection technique that shapes a model's decision boundary so identical inputs yield different predictions on different hardware platforms — a hidden, targeted malicious behavior activated by a specific 'trigger' (the target hardware), which is the defining characteristic of a model backdoor/trojan.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxtraining_timetargeteddigital

Applications

2026 0 cit.

Model Poisoning

92%