benchmark 2025

ContextLeak: Auditing Leakage in Private In-Context Learning Methods

Jacob Choi , Shuying Cao , Xingjian Dong , Wang Bill Zhu , Robin Jia , Sai Praneeth Karimireddy

3 citations · 1 influential · 58 references · arXiv

α

Published on arXiv

2512.16059

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

ContextLeak's empirical leakage measure tightly correlates with theoretical DP epsilon and exposes that most existing private ICL methods fail to achieve acceptable privacy-utility trade-offs.

ContextLeak

Novel technique introduced


In-Context Learning (ICL) has become a standard technique for adapting Large Language Models (LLMs) to specialized tasks by supplying task-specific exemplars within the prompt. However, when these exemplars contain sensitive information, reliable privacy-preserving mechanisms are essential to prevent unintended leakage through model outputs. Many privacy-preserving methods are proposed to protect the information leakage in the context, but there are less efforts on how to audit those methods. We introduce ContextLeak, the first framework to empirically measure the worst-case information leakage in ICL. ContextLeak uses canary insertion, embedding uniquely identifiable tokens in exemplars and crafting targeted queries to detect their presence. We apply ContextLeak across a range of private ICL techniques, both heuristic such as prompt-based defenses and those with theoretical guarantees such as Embedding Space Aggregation and Report Noisy Max. We find that ContextLeak tightly correlates with the theoretical privacy budget ($ε$) and reliably detects leakage. Our results further reveal that existing methods often strike poor privacy-utility trade-offs, either leaking sensitive information or severely degrading performance.


Key Contributions

  • ContextLeak: the first empirical auditing framework for measuring worst-case information leakage in private ICL via canary insertion and targeted query crafting
  • Empirical validation that ContextLeak's measured leakage tightly correlates with the theoretical DP privacy budget (ε) across multiple private ICL methods
  • Systematic evaluation revealing that existing private ICL techniques (heuristic and DP-based) exhibit poor privacy-utility trade-offs, either leaking sensitive exemplar data or severely degrading model performance

🛡️ Threat Analysis

Membership Inference Attack

The core technique — inserting uniquely identifiable canary tokens into ICL exemplars and crafting targeted queries to detect their presence — is fundamentally a membership inference attack applied to the context window. The framework explicitly correlates leakage with the differential privacy budget (ε), the canonical MIA-related privacy measure, and is used to audit whether private ICL defenses withstand adversarial detection of specific exemplars.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
in-context learningprivate llm inferencesensitive data handling in prompts