attack 2026

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Kıvanç Kuzey Dikici , Serdar Kara , Semih Çağlar , Eray Tüzün , Sinem Sav

0 citations

α

Published on arXiv

2604.01147

Membership Inference Attack

OWASP ML Top 10 — ML04

Key Finding

Achieves AUC-ROC of 0.7913 on StarCoder2-3B and 0.7867 on StarCoder2-7B, consistently outperforming Loss, Min-K% Prob, and PAC baselines on 25,000-sample balanced dataset

SERSEM

Novel technique introduced


As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B model and 0.7867 on the StarCoder2-7B model, consistently outperforming the implemented probability-based baselines Loss, Min-K% Prob, and PAC. Our findings demonstrate that focusing on human-centric coding anomalies provides a significantly more robust indicator of verbatim memorization than sequence-level probability averages.


Key Contributions

  • Novel SERSEM attack framework that uses AST-based weight masks to suppress syntactical boilerplate and amplify memorization signals
  • Dual-signal methodology combining character-level heuristic weights with internal transformer activation pooling and token-level Z-score calibration
  • Achieves 0.7913 AUC-ROC on StarCoder2-3B and 0.7867 on StarCoder2-7B, outperforming probability-based MIA baselines

🛡️ Threat Analysis

Membership Inference Attack

Core contribution is a membership inference attack (MIA) that determines whether specific code samples were in the training set of StarCoder2 models. Proposes novel SERSEM method achieving 0.79 AUC-ROC, outperforming Loss, Min-K% Prob, and PAC baselines.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxtraining_time
Datasets
The Stack v2The Heap
Applications
code generationcode completion