One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization, and optimization. Most machine learning frameworks use pseudorandom number generators as the source of randomness. However, variations in design choices and implementations across different frameworks, software dependencies, and hardware backends along with the lack of statistical validation can lead to previously unexplored attack vectors on machine learning systems. Such attacks on randomness sources can be extremely covert, and have a history of exploitation in real-world systems. In this work, we examine the role of randomness in the machine learning development pipeline from an adversarial point of view, and analyze the implementations of PRNGs in major machine learning frameworks. We present RNGGuard to help machine learning engineers secure their systems with low effort. RNGGuard statically analyzes a target library's source code and identifies instances of random functions and modules that use them. At runtime, RNGGuard enforces secure execution of random functions by replacing insecure function calls with RNGGuard's implementations that meet security specifications. Our evaluations show that RNGGuard presents a practical approach to close existing gaps in securing randomness sources in machine learning systems.

Key Contributions

Adversarial analysis of PRNG implementations across major ML frameworks, identifying design inconsistencies and statistical validation gaps that create exploitable attack surfaces
Taxonomy of attack vectors enabled by compromised or predictable randomness across ML pipeline stages (data sampling, augmentation, weight initialization, optimization)
RNGGuard: a two-phase defense tool combining static source code analysis to identify insecure random function usage with runtime enforcement that replaces insecure calls with cryptographically secure implementations

🛡️ Threat Analysis

AI Supply Chain Attacks

The attack surface is the ML framework infrastructure itself — PRNG implementations in PyTorch, TensorFlow, and other frameworks. Weak or predictable randomness in ML tooling (weight initialization, data sampling, augmentation) constitutes a software vulnerability in the ML supply chain. RNGGuard defends by enforcing secure PRNG usage across framework dependencies.