attack 2026

Turning Black Box into White Box: Dataset Distillation Leaks

0 citations

Published on arXiv

2603.01053

Model Inversion Attack

OWASP ML Top 10 — ML03

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

IRA accurately predicts both the distillation algorithm and model architecture from synthetic datasets, and successfully performs membership inference and recovers sensitive real training samples.

Information Revelation Attack (IRA)

Novel technique introduced

Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.

Key Contributions

Architecture inference stage that predicts distillation algorithm and model architecture from loss trajectories, effectively converting a black-box setting into white-box for the adversary
Membership inference attack leveraging the locally cloned white-box model's hidden-layer and final-layer outputs
Enhanced dual-network diffusion framework with trajectory loss for reconstructing real training samples from synthetic distilled datasets

🛡️ Threat Analysis

Model Inversion Attack

The final stage of IRA uses a dual-network diffusion framework to reconstruct sensitive real training samples from synthetic distilled datasets — a direct model inversion / training data reconstruction attack with an explicit adversarial threat model.

Membership Inference Attack

The second stage of IRA explicitly performs membership inference — determining whether a given sample was in the real training dataset — using hidden-layer and final-layer outputs of a locally cloned model.

Details

Domains

vision

Model Types

cnndiffusion

Threat Tags

black_boxinference_timetargeted

Datasets

CIFAR-10

Applications

dataset distillationprivacy-preserving mlimage classification

Read PDF arXiv

Turning Black Box into White Box: Dataset Distillation Leaks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Model Inversion Attack Against Deep Hashing

Latent Diffusion Inversion Requires Understanding the Latent Space

Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

Reconstructing Protected Biometric Templates from Binary Authentication Results

What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs

LeakyCLIP: Extracting Training Data from CLIP

Membership Inference Attacks with False Discovery Rate Control

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure