Hardware-Triggered Backdoors
Jonas Möller 1,2, Erik Imgrund 1,2, Thorsten Eisenhofer 3, Konrad Rieck 1,2
Published on arXiv
2601.21902
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Hardware-triggered backdoors achieve up to 100% attack success rate across five GPU platforms (H100, A100, A40, RTX6000) on ViT, ResNet, and EfficientNet in float16 and bfloat16 precision, with single-layer attacks maintaining ~95% success.
Hardware-Triggered Backdoors
Novel technique introduced
Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical results, differences in its design can lead to small numerical variations during inference. In this work, we show that these variations can be exploited to create backdoors in machine learning models. The core idea is to shape the model's decision function such that it yields different predictions for the same input when executed on different hardware. This effect is achieved by locally moving the decision boundary close to a target input and then refining numerical deviations to flip the prediction on selected hardware. We empirically demonstrate that these hardware-triggered backdoors can be created reliably across common GPU accelerators. Our findings reveal a novel attack vector affecting the use of third-party models, and we investigate different defenses to counter this threat.
Key Contributions
- Novel hardware-triggered backdoor attack that exploits floating-point numerical deviations across GPU accelerators to produce hardware-dependent predictions for the same input
- Methodology to locally move decision boundaries near target inputs and refine numerical deviations (via bit flips and permutation-based modifications) to reliably activate backdoors on selected hardware
- Empirical evaluation across H100, A100, A40, and RTX6000 GPUs showing >99% attack success rate across ViT, ResNet, and EfficientNet in float16 and bfloat16, plus investigation of defenses (input perturbation, batch size variation)
🛡️ Threat Analysis
Primary contribution is a backdoor injection technique that shapes a model's decision boundary so identical inputs yield different predictions on different hardware platforms — a hidden, targeted malicious behavior activated by a specific 'trigger' (the target hardware), which is the defining characteristic of a model backdoor/trojan.