attack 2025

Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

Li Cuihong , Huang Xiaowen , Yin Chuanhuan , Sang Jitao

0 citations · 21 references · arXiv

α

Published on arXiv

2511.14763

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Distillation-based MIA significantly outperforms shadow model-based MIAs and individual-feature baselines across four datasets and three LLM architectures for recommendation systems.

Distillation-based MIA (MIA4LLMRS)

Novel technique introduced


Membership Inference Attack (MIA) aims to determine whether a specific data sample was included in the training dataset of a target model. Traditional MIA approaches rely on shadow models to mimic target model behavior, but their effectiveness diminishes for Large Language Model (LLM)-based recommendation systems due to the scale and complexity of training data. This paper introduces a novel knowledge distillation-based MIA paradigm tailored for LLM-based recommendation systems. Our method constructs a reference model via distillation, applying distinct strategies for member and non-member data to enhance discriminative capabilities. The paradigm extracts fused features (e.g., confidence, entropy, loss, and hidden layer vectors) from the reference model to train an attack model, overcoming limitations of individual features. Extensive experiments on extended datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and diverse LLMs (T5, GPT-2, LLaMA3) demonstrate that our approach significantly outperforms shadow model-based MIAs and individual-feature baselines. The results show its practicality for privacy attacks in LLM-driven recommender systems.


Key Contributions

  • Novel knowledge distillation-based MIA paradigm that builds a reference model using distinct distillation strategies for member and non-member data, improving discriminability over shadow models
  • Fused feature extraction (confidence, entropy, loss, hidden layer vectors) to train the attack classifier, overcoming limitations of single-feature baselines
  • Extensive evaluation across four datasets (Last.FM, MovieLens, Book-Crossing, Delicious) and three LLM families (T5, GPT-2, LLaMA3), demonstrating significant improvements over prior MIA methods

🛡️ Threat Analysis

Membership Inference Attack

The paper's core contribution is a new membership inference attack (MIA) paradigm — determining whether specific user-item interaction records were used to train LLM-based recommendation systems. Uses knowledge distillation to build a reference model that generates discriminative features for member vs. non-member data, then trains an attack classifier on fused features (confidence, entropy, loss, hidden vectors).


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeted
Datasets
Last.FMMovieLensBook-CrossingDelicious
Applications
llm-based recommendation systems