Non-omniscient backdoor injection with one poison sample: Proving the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks
Thorsten Peinemann 1, Paula Arnold 1, Sebastian Berndt 2, Thomas Eisenbarth 1, Esfandiar Mohammadi 1
Published on arXiv
2508.05600
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
A non-omniscient adversary using one poison sample achieves 100% backdoor attack success rate with zero backdooring error while preserving near-identical benign task performance in linear and 2-layer ReLU models.
Backdoor poisoning attacks are a threat to machine learning models trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples but need much information about the data points, or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks. For adversaries that utilize a direction unused by the clean data distribution for the poison sample, we prove for linear classification and linear regression that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We validate our theoretical results experimentally with realistic benchmark data sets.
Key Contributions
- Proves the 'one-poison hypothesis': a non-omniscient adversary with one poison sample achieves 100% backdoor attack success rate with zero backdooring error in linear classification, linear regression, and 2-layer ReLU networks
- Proves functional equivalence between the poisoned and clean model on all clean inputs when the attacker exploits a direction orthogonal to the clean data distribution
- Extends statistical backdoor learning theory (Wang et al.) from classification to regression to bound benign task degradation in all remaining cases
🛡️ Threat Analysis
Proves that one maliciously crafted training sample can inject a backdoor (activated by a test-time trigger patch) with 100% attack success rate and zero backdooring error — a direct theoretical and empirical backdoor/trojan injection contribution across linear classifiers, linear regression, and 2-layer ReLU networks.