Detecting Data Contamination in LLMs via In-Context Learning
Michał Zawalski , Meriem Boubdir , Klaudia Bałazy , Besmira Nushi , Pablo Ribalta
Published on arXiv
2510.27055
Membership Inference Attack
OWASP ML Top 10 — ML04
Key Finding
CoDeC produces contamination scores that clearly separate seen from unseen datasets and reveals strong memorization evidence in open-weight LLMs with undisclosed training data
CoDeC
Novel technique introduced
We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes between data memorized during training and data outside the training distribution by measuring how in-context learning affects model performance. We find that in-context examples typically boost confidence for unseen datasets but may reduce it when the dataset was part of training, due to disrupted memorization patterns. Experiments show that CoDeC produces interpretable contamination scores that clearly separate seen and unseen datasets, and reveals strong evidence of memorization in open-weight models with undisclosed training corpora. The method is simple, automated, and both model- and dataset-agnostic, making it easy to integrate with benchmark evaluations.
Key Contributions
- CoDeC: an automated, model- and dataset-agnostic method that uses in-context learning signal shifts to detect training data contamination in LLMs
- Observation that in-context examples typically boost confidence on unseen data but can reduce it on memorized data by disrupting learned memorization patterns
- Interpretable contamination scores that clearly separate seen and unseen benchmark datasets, including evidence of memorization in open-weight models with undisclosed training corpora
🛡️ Threat Analysis
CoDeC determines whether specific datasets were part of an LLM's training set — this is membership inference at the dataset level, using in-context learning dynamics as the probe rather than shadow models or confidence thresholds. The binary 'seen vs. unseen' determination maps directly to ML04's core definition.