α

Published on arXiv

2603.01053

Model Inversion Attack

OWASP ML Top 10 — ML03

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

IRA accurately predicts both the distillation algorithm and model architecture from synthetic datasets, and successfully performs membership inference and recovers sensitive real training samples.

Information Revelation Attack (IRA)

Novel technique introduced


Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.


Key Contributions

  • Architecture inference stage that predicts distillation algorithm and model architecture from loss trajectories, effectively converting a black-box setting into white-box for the adversary
  • Membership inference attack leveraging the locally cloned white-box model's hidden-layer and final-layer outputs
  • Enhanced dual-network diffusion framework with trajectory loss for reconstructing real training samples from synthetic distilled datasets

🛡️ Threat Analysis

Model Inversion Attack

The final stage of IRA uses a dual-network diffusion framework to reconstruct sensitive real training samples from synthetic distilled datasets — a direct model inversion / training data reconstruction attack with an explicit adversarial threat model.

Membership Inference Attack

The second stage of IRA explicitly performs membership inference — determining whether a given sample was in the real training dataset — using hidden-layer and final-layer outputs of a locally cloned model.


Details

Domains
vision
Model Types
cnndiffusion
Threat Tags
black_boxinference_timetargeted
Datasets
CIFAR-10
Applications
dataset distillationprivacy-preserving mlimage classification