FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators
Darsh Asher , Farshad Dizani , Joshua Kalyanapu , Rosario Cammarota , Aydin Aysu , Samira Mirbagher Ajorpaz
Published on arXiv
2602.18304
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
FEATUREBLEED achieves up to 98.87 percentage points adversarial advantage in inferring private backend-retrieved features via end-to-end timing, generalizing across CPU and GPU accelerators (Intel AVX/AMX, NVIDIA A100) and multiple model architectures.
FEATUREBLEED
Novel technique introduced
Backend enrichment is now widely deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private features whose values influence inference behavior while remaining hidden from the API caller. This paper presents the first hardware-level backend retrieval data-stealing attack, showing that accelerator optimizations designed for performance can directly undermine data confidentiality and bypass state-of-the-art privacy defenses. Our attack, FEATUREBLEED, exploits zero-skipping in AI accelerators to infer private backend-retrieved features solely through end-to-end timing, without relying on power analysis, DVFS manipulation, or shared-cache side channels. We evaluate FEATUREBLEED on three datasets spanning medical and non-medical domains: Texas-100X (clinical records), OrganAMNIST (medical imaging), and Census-19 (socioeconomic data). We further evaluate FEATUREBLEED across three hardware backends (Intel AVX, Intel AMX, and NVIDIA A100) and three model architectures (DNNs, CNNs, and hybrid CNN-MLP pipelines), demonstrating that the leakage generalizes across CPU and GPU accelerators, data modalities, and application domains, with an adversarial advantage of up to 98.87 percentage points. Finally, we identify the root cause of the leakage as sparsity-driven zero-skipping in modern hardware. We quantify the privacy-performance-power trade-off: disabling zero-skipping increases Intel AMX per-operation energy by up to 25 percent and incurs 100 percent performance overhead. We propose a padding-based defense that masks timing leakage by equalizing responses to the worst-case execution time, achieving protection with only 7.24 percent average performance overhead and no additional power cost.
Key Contributions
- First hardware-level timing side-channel attack (FEATUREBLEED) that infers private backend-enriched features from AI accelerator zero-skipping behavior during inference
- Empirical evaluation across Intel AVX, Intel AMX, and NVIDIA A100 on three datasets (Texas-100X, OrganAMNIST, Census-19), achieving up to 98.87 pp adversarial advantage
- Padding-based defense that equalizes execution time to worst-case, limiting leakage with only 7.24% average performance overhead and no additional power cost
🛡️ Threat Analysis
The adversary recovers private sensitive attributes (clinical records, socioeconomic data) that were silently injected into the ML inference pipeline via backend feature retrieval. By observing end-to-end timing variation caused by sparsity-driven zero-skipping, the attacker reconstructs these private per-request attributes — a model inversion / private-attribute extraction attack, albeit at the hardware level rather than through model API outputs or gradient leakage.