Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise

Diffusion models have achieved remarkable progress in image generation, but their increasing deployment raises serious concerns about privacy. In particular, fine-tuned models are highly vulnerable, as they are often fine-tuned on small and private datasets. Membership inference attacks (MIAs) are used to assess privacy risks by determining whether a specific sample was part of a model's training data. Existing MIAs against diffusion models either assume obtaining the intermediate results or require auxiliary datasets for training the shadow model. In this work, we utilized a critical yet overlooked vulnerability: the widely used noise schedules fail to fully eliminate semantic information in the images, resulting in residual semantic signals even at the maximum noise step. We empirically demonstrate that the fine-tuned diffusion model captures hidden correlations between the residual semantics in initial noise and the original images. Building on this insight, we propose a simple yet effective membership inference attack, which injects semantic information into the initial noise and infers membership by analyzing the model's generation result. Extensive experiments demonstrate that the semantic initial noise can strongly reveal membership information, highlighting the vulnerability of diffusion models to MIAs.

Key Contributions

Identifies a novel vulnerability: standard noise schedules fail to fully destroy semantic information in images, leaving residual signals even at maximum noise step T
Proposes a black-box MIA that injects semantic content into initial noise and infers membership by comparing the model's generated output to the original image — requiring no intermediate denoising outputs or auxiliary shadow-model datasets
Empirically demonstrates that fine-tuned diffusion models capture hidden correlations between residual semantics in initial noise and training images, enabling strong membership discrimination

🛡️ Threat Analysis

Membership Inference Attack

The paper's entire contribution is a membership inference attack — determining whether a specific image was in a diffusion model's fine-tuning dataset. The attack injects semantic information into initial noise and measures how closely the model reproduces the original image, directly implementing the binary 'was this sample in training?' question that defines ML04.