attack 2025

GenAI Confessions: Black-box Membership Inference for Generative Image Models

Matyas Bohacek 1, Hany Farid 2

0 citations · ICCV-W

α

Published on arXiv

2501.06399

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Demonstrates a black-box MI method that generalizes across multiple diffusion model architectures and can determine training membership at the single-image level without model access.


From a simple text prompt, generative-AI image models can create stunningly realistic and creative images bounded, it seems, by only our imagination. These models have achieved this remarkable feat thanks, in part, to the ingestion of billions of images collected from nearly every corner of the internet. Many creators have understandably expressed concern over how their intellectual property has been ingested without their permission or a mechanism to opt out of training. As a result, questions of fair use and copyright infringement have quickly emerged. We describe a method that allows us to determine if a model was trained on a specific image or set of images. This method is computationally efficient and assumes no explicit knowledge of the model architecture or weights (so-called black-box membership inference). We anticipate that this method will be crucial for auditing existing models and, looking ahead, ensuring the fairer development and deployment of generative AI models.


Key Contributions

  • Computationally efficient black-box membership inference method for generative image models requiring no knowledge of model architecture or weights
  • STROLL dataset of semantically matched in-training/out-of-training image pairs for rigorous MI evaluation
  • Empirical analysis of membership inference and memorization across Stable Diffusion (v1.4, v2.1, v3.0), Midjourney (v6), and DALL-E (v2)

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is a membership inference method that determines whether specific images were part of a generative model's training set — the canonical ML04 threat. Black-box, architecture-agnostic, and operates at the single-image level.


Details

Domains
visiongenerative
Model Types
diffusiongan
Threat Tags
black_boxinference_time
Datasets
STROLLStable Diffusion training data (LAION)
Applications
text-to-image generationgenerative ai copyright auditing