attack 2025

Non-omniscient backdoor injection with one poison sample: Proving the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks

Thorsten Peinemann ¹, Paula Arnold ¹, Sebastian Berndt ², Thomas Eisenbarth ¹, Esfandiar Mohammadi ¹

¹ University of Lübeck

² Technische Hochschule Lübeck

0 citations

Published on arXiv

2508.05600

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

A non-omniscient adversary using one poison sample achieves 100% backdoor attack success rate with zero backdooring error while preserving near-identical benign task performance in linear and 2-layer ReLU models.

Backdoor poisoning attacks are a threat to machine learning models trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples but need much information about the data points, or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks. For adversaries that utilize a direction unused by the clean data distribution for the poison sample, we prove for linear classification and linear regression that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We validate our theoretical results experimentally with realistic benchmark data sets.

Key Contributions

Proves the 'one-poison hypothesis': a non-omniscient adversary with one poison sample achieves 100% backdoor attack success rate with zero backdooring error in linear classification, linear regression, and 2-layer ReLU networks
Proves functional equivalence between the poisoned and clean model on all clean inputs when the attacker exploits a direction orthogonal to the clean data distribution
Extends statistical backdoor learning theory (Wang et al.) from classification to regression to bound benign task degradation in all remaining cases

🛡️ Threat Analysis

Model Poisoning

Proves that one maliciously crafted training sample can inject a backdoor (activated by a test-time trigger patch) with 100% attack success rate and zero backdooring error — a direct theoretical and empirical backdoor/trojan injection contribution across linear classifiers, linear regression, and 2-layer ReLU networks.

Details

Model Types

traditional_ml

Threat Tags

training_timetargetedgrey_box

Applications

linear classificationlinear regressionimage classification

Read PDF arXiv

Non-omniscient backdoor injection with one poison sample: Proving the one-poison hypothesis for linear regression, linear classification, and 2-layer ReLU neural networks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding

Stealthy Backdoor Attack to Real-world Models in Android Apps

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification