attack 2026

Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise

Puwei Lian 1, Yujun Cai 2, Songze Li 1, Bingkun Bao 3

0 citations · 53 references · arXiv

α

Published on arXiv

2601.21628

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Semantic initial noise strongly reveals membership information in fine-tuned diffusion models under a realistic black-box end-to-end threat model, outperforming prior attacks that require intermediate denoising access or auxiliary data.

Semantic Initial Noise MIA

Novel technique introduced


Diffusion models have achieved remarkable progress in image generation, but their increasing deployment raises serious concerns about privacy. In particular, fine-tuned models are highly vulnerable, as they are often fine-tuned on small and private datasets. Membership inference attacks (MIAs) are used to assess privacy risks by determining whether a specific sample was part of a model's training data. Existing MIAs against diffusion models either assume obtaining the intermediate results or require auxiliary datasets for training the shadow model. In this work, we utilized a critical yet overlooked vulnerability: the widely used noise schedules fail to fully eliminate semantic information in the images, resulting in residual semantic signals even at the maximum noise step. We empirically demonstrate that the fine-tuned diffusion model captures hidden correlations between the residual semantics in initial noise and the original images. Building on this insight, we propose a simple yet effective membership inference attack, which injects semantic information into the initial noise and infers membership by analyzing the model's generation result. Extensive experiments demonstrate that the semantic initial noise can strongly reveal membership information, highlighting the vulnerability of diffusion models to MIAs.


Key Contributions

  • Identifies a novel vulnerability: standard noise schedules fail to fully destroy semantic information in images, leaving residual signals even at maximum noise step T
  • Proposes a black-box MIA that injects semantic content into initial noise and infers membership by comparing the model's generated output to the original image — requiring no intermediate denoising outputs or auxiliary shadow-model datasets
  • Empirically demonstrates that fine-tuned diffusion models capture hidden correlations between residual semantics in initial noise and training images, enabling strong membership discrimination

🛡️ Threat Analysis

Membership Inference Attack

The paper's entire contribution is a membership inference attack — determining whether a specific image was in a diffusion model's fine-tuning dataset. The attack injects semantic information into initial noise and measures how closely the model reproduces the original image, directly implementing the binary 'was this sample in training?' question that defines ML04.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
black_boxinference_time
Datasets
FlickrCOCO
Applications
image generationfine-tuned diffusion models