SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems
Kaihong Li 1, Huichi Zhou 2, Bin Ma 3, Fangjun Huang 1
Published on arXiv
2509.24961
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
SemanticShield (1.5B parameters, GRPO-finetuned) surpasses Llama-3-70B-Instruct in shilling detection accuracy and generalizes to previously unseen attack strategies
SemanticShield
Novel technique introduced
Recommender systems (RS) are widely used in e-commerce for personalized suggestions, yet their openness makes them susceptible to shilling attacks, where adversaries inject fake behaviors to manipulate recommendations. Most existing defenses emphasize user-side behaviors while overlooking item-side features such as titles and descriptions that can expose malicious intent. To address this gap, we propose a two-stage detection framework that integrates item-side semantics via large language models (LLMs). The first stage pre-screens suspicious users using low-cost behavioral criteria, and the second stage employs LLM-based auditing to evaluate semantic consistency. Furthermore, we enhance the auditing model through reinforcement fine-tuning on a lightweight LLM with carefully designed reward functions, yielding a specialized detector called SemanticShield. Experiments on six representative attack strategies demonstrate the effectiveness of SemanticShield against shilling attacks, and further evaluation on previously unseen attack methods shows its strong generalization capability. Code is available at https://github.com/FrankenstLee/SemanticShield.
Key Contributions
- Two-stage detection framework combining PCA-based behavioral pre-screening with LLM semantic auditing of item-side features to detect shilling attack profiles
- Reinforcement fine-tuning of Qwen2.5-1.5B-Instruct via GRPO with task-specific reward functions, yielding SemanticShield — a lightweight specialized detector that outperforms Llama-3-70B-Instruct
- Demonstrated generalization to unseen attack strategies across six representative shilling attack methods on real-world datasets
🛡️ Threat Analysis
Shilling attacks inject fake user profiles into recommender system interaction data to manipulate model behavior — a canonical form of data poisoning. SemanticShield is a defense that detects this poisoning by combining statistical pre-filtering with LLM-based semantic consistency auditing.