tool 2025

Tight and Practical Privacy Auditing for Differentially Private In-Context Learning

Yuyang Xia , Ruixuan Liu , Li Xiong

0 citations · 38 references · arXiv

α

Published on arXiv

2511.13502

Membership Inference Attack

OWASP ML Top 10 — ML04

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds, confirming the framework's tightness as a practical auditor.


Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data. Although differential privacy (DP) with private voting is a pragmatic mitigation, DP-ICL implementations are error-prone, and worst-case DP bounds may substantially overestimate actual leakage, calling for practical auditing tools. We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP. Our analysis of the private voting mechanism identifies vote configurations that maximize the auditing signal, guiding the design of audit queries that reliably reveal whether a canary demonstration is present in the context. The framework supports both black-box (API-only) and white-box (internal vote) threat models, and unifies auditing for classification and generation by reducing both to a binary decision problem. Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds, making our framework a practical privacy auditor and verifier for real-world DP-ICL deployments.


Key Contributions

  • First tight and practical privacy auditing framework for DP-ICL systems that translates MIA success rates into empirical Gaussian DP guarantees
  • Theoretical analysis of the private voting mechanism identifying vote configurations that maximize auditing signal, guiding effective audit query design
  • Unified auditing across classification and generation tasks by reducing both to a binary decision problem, supporting black-box and white-box threat models

🛡️ Threat Analysis

Membership Inference Attack

The framework's core mechanism is running membership inference attacks — determining whether a specific canary demonstration was present in the ICL context — and converting attack success rates into empirical privacy guarantees. The paper designs optimal audit queries to maximize MIA signal against the private voting mechanism.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxwhite_boxinference_time
Datasets
standard text classification benchmarksstandard text generation benchmarks
Applications
llm in-context learningtext classificationtext generation