attack 2026

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

Hanna Foerster 1,2, Ilia Shumailov 2, Cheng Zhang 3, Yiren Zhao 2, Jamie Hayes 2, Robert Mullins 1

0 citations

α

Published on arXiv

2604.26505

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Key Finding

At least 4 popular ML frameworks default to or support configurations that leak user data across batch boundaries via dynamic quantization side channels

Quantamination

Novel technique introduced


Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving. This is because dynamic quantization can significantly reduce memory usage and computational load, leading to faster token generation and improved model serving efficiency without substantial loss in model accuracy. In this paper, we reveal a critical vulnerability in dynamic quantization: an adversary can exploit such quantization strategy to steal sensitive user data placed in the same batch as the adversary's input. Our analysis demonstrates that dynamic quantization, when improperly implemented or configured, can create side channels that expose information about other inputs within the same batch. We call this phenomenon Quantamination, describing contamination from quantization. Specifically, we show that at least 4 of the most popular ML frameworks in use today either default to or can use configurations that leak data across the batch boundary. This data leakage, in theory, allows attackers to partially or even fully recover other users' batched input data, representing a serious privacy risk for existing ML serving frameworks.


Key Contributions

  • Identifies critical side-channel vulnerability in dynamic quantization that leaks data across batch boundaries
  • Demonstrates exploitability across 4+ major ML frameworks (compilers and inference engines)
  • Shows attackers can partially or fully recover other users' batched input data through quantization parameters

🛡️ Threat Analysis

AI Supply Chain Attacks

Vulnerability in ML infrastructure/frameworks — specifically in dynamic quantization implementations across 4+ mainstream ML frameworks that create batch-level side channels during model serving.


Details

Threat Tags
inference_timegrey_box
Applications
ml model servingbatched inferenceproduction ml systems