Membership Inference Attacks Expose Participation Privacy in ECG Foundation Encoders
Ziyu Wang 1, Elahe Khatibi 1, Ankita Sharma 2, Krishnendu Chakrabarty 2, Sanaz Rahimi Moosavi 3, Farshad Firouzi 2, Amir Rahmani 1
Published on arXiv
2604.10424
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Leakage is most pronounced in small or institution-specific cohorts; for contrastive encoders it can saturate in embedding space, while larger and more diverse datasets substantially attenuate operational tail risk
Foundation-style ECG encoders pretrained with self-supervised learning are increasingly reused across tasks, institutions, and deployment contexts, often through model-as-a-service interfaces that expose scalar scores or latent representations. While such reuse improves data efficiency and generalization, it raises a participation privacy concern: can an adversary infer whether a specific individual or cohort contributed ECG data to pretraining, even when raw waveforms and diagnostic labels are never disclosed? In connected-health settings, training participation itself may reveal institutional affiliation, study enrollment, or sensitive health context. We present an implementation-grounded audit of membership inference attacks (MIAs) against modern self-supervised ECG foundation encoders, covering contrastive objectives (SimCLR, TS2Vec) and masked reconstruction objectives (CNN- and Transformer-based MAE). We evaluate three realistic attacker interfaces: (i) score-only black-box access to scalar outputs, (ii) adaptive learned attackers that aggregate subject-level statistics across repeated queries, and (iii) embedding-access attackers that probe latent representation geometry. Using a subject-centric protocol with window-to-subject aggregation and calibration at fixed false-positive rates under a cross-dataset auditing setting, we observe heterogeneous and objective-dependent participation leakage: leakage is most pronounced in small or institution-specific cohorts and, for contrastive encoders, can saturate in embedding space, while larger and more diverse datasets substantially attenuate operational tail risk. Overall, our results show that restricting access to raw signals or labels is insufficient to guarantee participation privacy, underscoring the need for deployment-aware auditing of reusable biosignal foundation encoders in connected-health systems.
Key Contributions
- Implementation-grounded membership inference audit of self-supervised ECG foundation encoders (SimCLR, TS2Vec, MAE variants)
- Subject-centric evaluation protocol with window-to-subject aggregation under three realistic attacker interfaces: score-only, adaptive learned aggregation, and embedding-access
- Finding that participation leakage is heterogeneous and objective-dependent, most pronounced in small cohorts and contrastive encoders, with embedding access enabling stronger attacks than score-only interfaces
🛡️ Threat Analysis
Core contribution is a systematic audit of membership inference attacks against self-supervised biosignal foundation encoders. The paper evaluates whether an adversary can determine if a specific individual's ECG data was included in pretraining, implementing score-based, learned aggregation, and embedding-space MIA attackers across multiple access interfaces.