SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models
Kıvanç Kuzey Dikici , Serdar Kara , Semih Çağlar , Eray Tüzün , Sinem Sav
Published on arXiv
2604.01147
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
Achieves AUC-ROC of 0.7913 on StarCoder2-3B and 0.7867 on StarCoder2-7B, consistently outperforming Loss, Min-K% Prob, and PAC baselines on 25,000-sample balanced dataset
SERSEM
Novel technique introduced
As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B model and 0.7867 on the StarCoder2-7B model, consistently outperforming the implemented probability-based baselines Loss, Min-K% Prob, and PAC. Our findings demonstrate that focusing on human-centric coding anomalies provides a significantly more robust indicator of verbatim memorization than sequence-level probability averages.
Key Contributions
- Novel SERSEM attack framework that uses AST-based weight masks to suppress syntactical boilerplate and amplify memorization signals
- Dual-signal methodology combining character-level heuristic weights with internal transformer activation pooling and token-level Z-score calibration
- Achieves 0.7913 AUC-ROC on StarCoder2-3B and 0.7867 on StarCoder2-7B, outperforming probability-based MIA baselines
🛡️ Threat Analysis
Core contribution is a membership inference attack (MIA) that determines whether specific code samples were in the training set of StarCoder2 models. Proposes novel SERSEM method achieving 0.79 AUC-ROC, outperforming Loss, Min-K% Prob, and PAC baselines.