attack 2025

Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

Asia Belfiore 1, Jonathan Passerat-Palmbach 1, Dmitrii Usynin 2

1 citations · 31 references · arXiv

α

Published on arXiv

2511.07503

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

The biologically-informed hybrid MIA (biHMIA) achieves higher average adversarial success than traditional metric-based membership inference attacks on DP-protected generative genomic language models.

biHMIA

Novel technique introduced


The increased availability of genetic data has transformed genomics research, but raised many privacy concerns regarding its handling due to its sensitive nature. This work explores the use of language models (LMs) for the generation of synthetic genetic mutation profiles, leveraging differential privacy (DP) for the protection of sensitive genetic data. We empirically evaluate the privacy guarantees of our DP modes by introducing a novel Biologically-Informed Hybrid Membership Inference Attack (biHMIA), which combines traditional black box MIA with contextual genomics metrics for enhanced attack power. Our experiments show that both small and large transformer GPT-like models are viable synthetic variant generators for small-scale genomics, and that our hybrid attack leads, on average, to higher adversarial success compared to traditional metric-based MIAs.


Key Contributions

  • Novel biHMIA attack that fuses traditional black-box MIA with biologically-informed genomic metrics (e.g., allele frequency, SNP rarity) for enhanced attack power
  • Empirical evaluation showing biHMIA achieves higher average adversarial success than traditional metric-based MIAs against DP-protected generative genomic models
  • Demonstration that small and large GPT-like transformer models are viable synthetic genetic mutation profile generators for small-scale genomics settings

🛡️ Threat Analysis

Membership Inference Attack

The paper introduces biHMIA, a novel membership inference attack that determines whether specific genomic records were in the training set of generative models, achieving higher adversarial success than traditional metric-based MIAs against DP-protected models.


Details

Domains
generativenlp
Model Types
transformerllm
Threat Tags
black_boxinference_time
Datasets
1000 Genomes Project
Applications
synthetic genomic data generationgenomic privacy evaluation