Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning

Large language models (LLMs) exhibit powerful capabilities but risk memorizing sensitive personally identifiable information (PII) from their training data, posing significant privacy concerns. While machine unlearning techniques aim to remove such data, they predominantly depend on access to the training data. This requirement is often impractical, as training data in real-world deployments is commonly proprietary or inaccessible. To address this limitation, we propose Data-Free Selective Unlearning (DFSU), a novel privacy-preserving framework that removes sensitive PII from an LLM without requiring its training data. Our approach first synthesizes pseudo-PII through language model inversion, then constructs token-level privacy masks for these synthetic samples, and finally performs token-level selective unlearning via a contrastive mask loss within a low-rank adaptation (LoRA) subspace. Extensive experiments on the AI4Privacy PII-Masking dataset using Pythia models demonstrate that our method effectively removes target PII while maintaining model utility.

Key Contributions

Data-free selective unlearning (DFSU) framework that removes PII from LLMs without requiring access to original training data
Language model inversion to synthesize pseudo-PII tokens representing what the model has memorized
Token-level contrastive mask loss within a LoRA subspace for targeted selective unlearning

🛡️ Threat Analysis

Model Inversion Attack

Model inversion is the core technical contribution: the framework synthesizes pseudo-PII by inverting the LLM to recover what sensitive training data it has memorized, without access to the original training corpus. This is directly analogous to a model inversion attack used here as a defensive probe.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timewhite_box

Datasets

AI4Privacy PII-Masking dataset

Applications

2025 0 cit.

Model Inversion Attack

86%