Assessing User Privacy Leakage in Synthetic Packet Traces: An Attack-Grounded Approach
Minhao Jin , Hongyu He , Maria Apostolaki
Published on arXiv
2508.11742
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
TraceBleed outperforms prior MIA baselines by 172%, and sharing more synthetic data amplifies user-level privacy leakage by 59% on average across GAN-, diffusion-, and GPT-based generators
TraceBleed
Novel technique introduced
Current synthetic traffic generators (SynNetGens) promise privacy but lack comprehensive guarantees or empirical validation, even as their fidelity steadily improves. We introduce the first attack-grounded benchmark for assessing the privacy of SynNetGens directly from the traffic they produce. We frame privacy as membership inference at the traffic-source level--a realistic and actionable threat for data holders. To this end, we present TraceBleed, the first attack that exploits behavioral fingerprints across flows using contrastive learning and temporal chunking, outperforming prior membership inference baselines by 172%. Our large-scale study across GAN-, diffusion-, and GPT-based SynNetGens uncovers critical insights: (i) SynNetGens leak user-level information; (ii) differential privacy either fails to stop these attacks or severely degrades fidelity; and (iii) sharing more synthetic data amplifies leakage by 59% on average. Finally, we introduce TracePatch, the first SynNetGen-agnostic defense that combines adversarial ML with SMT constraints to mitigate leakage while preserving fidelity.
Key Contributions
- TraceBleed: first membership inference attack against synthetic network traffic generators, using contrastive learning and temporal chunking to exploit behavioral fingerprints across flows — 172% improvement over prior MIA baselines
- Large-scale attack-grounded benchmark across GAN-, diffusion-, and GPT-based SynNetGens showing user-level privacy leakage, DP failure, and 59% leakage amplification from sharing more data
- TracePatch: first SynNetGen-agnostic defense combining adversarial ML with SMT constraints to mitigate membership inference leakage while preserving fidelity
🛡️ Threat Analysis
The paper explicitly frames the core threat as membership inference at the traffic-source level; TraceBleed is a novel MIA that determines whether a specific user's traffic was in the SynNetGen's training set, outperforming prior MIA baselines by 172%.