attack 2025

Non-omniscient backdoor injection with one poison sample: Proving the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks

Thorsten Peinemann 1, Paula Arnold 1, Sebastian Berndt 2, Thomas Eisenbarth 1, Esfandiar Mohammadi 1

0 citations

α

Published on arXiv

2508.05600

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

A non-omniscient adversary using one poison sample achieves 100% backdoor attack success rate with zero backdooring error while preserving near-identical benign task performance in linear and 2-layer ReLU models.


Backdoor poisoning attacks are a threat to machine learning models trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples but need much information about the data points, or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks. For adversaries that utilize a direction unused by the clean data distribution for the poison sample, we prove for linear classification and linear regression that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We validate our theoretical results experimentally with realistic benchmark data sets.


Key Contributions

  • Proves the 'one-poison hypothesis': a non-omniscient adversary with one poison sample achieves 100% backdoor attack success rate with zero backdooring error in linear classification, linear regression, and 2-layer ReLU networks
  • Proves functional equivalence between the poisoned and clean model on all clean inputs when the attacker exploits a direction orthogonal to the clean data distribution
  • Extends statistical backdoor learning theory (Wang et al.) from classification to regression to bound benign task degradation in all remaining cases

🛡️ Threat Analysis

Model Poisoning

Proves that one maliciously crafted training sample can inject a backdoor (activated by a test-time trigger patch) with 100% attack success rate and zero backdooring error — a direct theoretical and empirical backdoor/trojan injection contribution across linear classifiers, linear regression, and 2-layer ReLU networks.


Details

Model Types
traditional_ml
Threat Tags
training_timetargetedgrey_box
Applications
linear classificationlinear regressionimage classification