Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.

Key Contributions

Novel knowledge distillation-based MIA paradigm that builds a reference model using distinct distillation strategies for member and non-member data, improving discriminability over shadow models
Fused feature extraction (confidence, entropy, loss, hidden layer vectors) to train the attack classifier, overcoming limitations of single-feature baselines
Extensive evaluation across four datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and three LLM families (T5, GPT-2, LLaMA3), demonstrating significant improvements over prior MIA methods

🛡️ Threat Analysis

Membership Inference Attack

The paper's core contribution is a new membership inference attack (MIA) paradigm — determining whether specific user-item interaction records were used to train LLM-based recommendation systems. Uses knowledge distillation to build a reference model that generates discriminative features for member vs. non-member data, then trains an attack classifier on fused features (confidence, entropy, loss, hidden vectors).

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeted

Datasets

Last.FMMovieLensBook-CrossingDelicious

Applications

2026 0 cit.

Membership Inference Attack

91%

Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files

Win-k: Improved Membership Inference Attacks on Small Language Models

In-Context Probing for Membership Inference in Fine-Tuned Language Models

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

PDR: A Plug-and-Play Positional Decay Framework for LLM Pre-training Data Detection

Effective Code Membership Inference for Code Completion Models via Adversarial Prompts

Membership Inference on LLMs in the Wild