DRAG: Data Reconstruction Attack using Guided Diffusion

With the rise of large foundation models, split inference (SI) has emerged as a popular computational paradigm for deploying models across lightweight edge devices and cloud servers, addressing data privacy and computational cost concerns. However, most existing data reconstruction attacks have focused on smaller CNN classification models, leaving the privacy risks of foundation models in SI settings largely unexplored. To address this gap, we propose a novel data reconstruction attack based on guided diffusion, which leverages the rich prior knowledge embedded in a latent diffusion model (LDM) pre-trained on a large-scale dataset. Our method performs iterative reconstruction on the LDM's learned image prior, effectively generating high-fidelity images resembling the original data from their intermediate representations (IR). Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, both qualitatively and quantitatively, in reconstructing data from deep-layer IRs of the vision foundation model. The results highlight the urgent need for more robust privacy protection mechanisms for large models in SI scenarios. Code is available at: https://github.com/ntuaislab/DRAG.

Key Contributions

DRAG: a novel data reconstruction attack using a pre-trained latent diffusion model as a guided prior to reconstruct private inputs from intermediate representations in split inference
First systematic evaluation of data reconstruction attacks against deep-layer IRs of large vision foundation models (as opposed to prior CNN-focused work)
Demonstrates that foundation models in split inference are significantly more vulnerable than previously assumed, outperforming SOTA reconstruction methods qualitatively and quantitatively

🛡️ Threat Analysis

Model Inversion Attack

Core contribution is a data reconstruction attack: an adversary (malicious/curious cloud server in split inference) reconstructs private input images from intermediate representations of a vision foundation model. Uses a pre-trained latent diffusion model as a prior to iteratively invert IRs back to high-fidelity images — a direct model inversion attack targeting input data privacy.