SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B model and 0.7867 on the StarCoder2-7B model, consistently outperforming the implemented probability-based baselines Loss, Min-K% Prob, and PAC. Our findings demonstrate that focusing on human-centric coding anomalies provides a significantly more robust indicator of verbatim memorization than sequence-level probability averages.

Key Contributions

Novel SERSEM attack framework that uses AST-based weight masks to suppress syntactical boilerplate and amplify memorization signals
Dual-signal methodology combining character-level heuristic weights with internal transformer activation pooling and token-level Z-score calibration
Achieves 0.7913 AUC-ROC on StarCoder2-3B and 0.7867 on StarCoder2-7B, outperforming probability-based MIA baselines

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is a membership inference attack (MIA) that determines whether specific code samples were in the training set of StarCoder2 models. Proposes novel SERSEM method achieving 0.79 AUC-ROC, outperforming Loss, Min-K% Prob, and PAC baselines.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_time

Datasets

The Stack v2The Heap

Applications

2025 1 cit.

Membership Inference Attack

67%

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis

On the Evidentiary Limits of Membership Inference for Copyright Auditing

Membership Inference Attacks on Tokenizers of Large Language Models

LoRA and Privacy: When Random Projections Help (and When They Don't)

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

Effective Code Membership Inference for Code Completion Models via Adversarial Prompts