Detecting Data Contamination in LLMs via In-Context Learning

We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes between data memorized during training and data outside the training distribution by measuring how in-context learning affects model performance. We find that in-context examples typically boost confidence for unseen datasets but may reduce it when the dataset was part of training, due to disrupted memorization patterns. Experiments show that CoDeC produces interpretable contamination scores that clearly separate seen and unseen datasets, and reveals strong evidence of memorization in open-weight models with undisclosed training corpora. The method is simple, automated, and both model- and dataset-agnostic, making it easy to integrate with benchmark evaluations.

Key Contributions

CoDeC: an automated, model- and dataset-agnostic method that uses in-context learning signal shifts to detect training data contamination in LLMs
Observation that in-context examples typically boost confidence on unseen data but can reduce it on memorized data by disrupting learned memorization patterns
Interpretable contamination scores that clearly separate seen and unseen benchmark datasets, including evidence of memorization in open-weight models with undisclosed training corpora

🛡️ Threat Analysis

Membership Inference Attack

CoDeC determines whether specific datasets were part of an LLM's training set — this is membership inference at the dataset level, using in-context learning dynamics as the probe rather than shadow models or confidence thresholds. The binary 'seen vs. unseen' determination maps directly to ML04's core definition.