Membership Inference Attacks on LLM-based Recommender Systems

Large language models (LLMs) based recommender systems (RecSys) can adapt to different domains flexibly. It utilizes in-context learning (ICL), i.e., prompts, to customize the recommendation functions, which include sensitive historical user-specific item interactions, encompassing implicit feedback such as clicked items and explicit product reviews. Such private information may be exposed by novel privacy attacks. However, no study has been conducted on this important issue. We design several membership inference attacks (MIAs) aimed to revealing whether system prompts include victims' historical interactions. The attacks are \emph{Similarity, Memorization, Inquiry, and Poisoning attacks}, each utilizing unique features of LLMs or RecSys. We have carefully evaluated them on five of the latest open-source LLMs and three well-known RecSys benchmark datasets. The results confirm that the MIA threat to LLM RecSys is realistic: inquiry and poisoning attacks show significantly high attack advantages. We also discussed possible methods to mitigate such MIA threats. We have also analyzed the factors affecting these attacks, such as the number of shots in system prompts, the position of the victim in the shots, the number of poisoning items in the prompt,etc.

Key Contributions

Four novel MIA methods (Similarity, Memorization, Inquiry, Poisoning) tailored to the unique properties of ICL-based LLM recommender systems, where attack targets are few-shot user interactions in system prompts rather than model training data.
Empirical evaluation across five open-source LLMs and three RecSys benchmark datasets confirming that Inquiry and Poisoning attacks achieve significantly high attack advantages.
Analysis of factors affecting MIA success (number of prompt shots, victim position, number of poisoning items) and discussion of mitigation strategies.

🛡️ Threat Analysis

Membership Inference Attack

Paper's primary contribution is four distinct membership inference attacks (Similarity, Memorization, Inquiry, Poisoning) designed to determine whether specific user-item interactions are included in LLM system prompts — binary membership inference is the core threat model.