LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models
Published on arXiv
2604.23795
Membership Inference Attack
OWASP ML Top 10 — ML04
Sensitive Information Disclosure
OWASP LLM Top 10 — LLM06
Key Finding
DP-SGD reduces MIA attacker advantage by 71.5% while simultaneously improving out-of-distribution utility by 47-50% on DistilGPT-2 fine-tuned on clinical data
LLM-CEG
Novel technique introduced
This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic framework that employs membership inference attack (MIA) success rates as an empirical privacy gauge and model perplexity as a utility gauge, iteratively adjusting differential privacy parameters until both thresholds are jointly satisfied. A proof-of-concept prototype fine-tunes DistilGPT-2 on a synthetic clinical PII dataset under four privacy regimes using DP-SGD. Results indicate that DP-SGD reduces MIA attacker advantage by 71.5% while simultaneously improving out-of-distribution utility by 47-50% relative to the overfitted baseline, suggesting that differential privacy may act as implicit regularization under narrow fine-tuning conditions. We further extend the SIED engineering framework to the LLM context as LLM-SIED, providing an auditable, regulator-aligned process for privacy-compliant LLM deployment.
Key Contributions
- LLM-CEG framework extending Classification Error Gauge to LLM privacy auditing using MIA success rates and perplexity metrics
- LLM-SIED engineering process for privacy-compliant LLM deployment aligned with EU AI Act and NIST AI RMF
- Empirical finding that DP-SGD reduces MIA advantage by 71.5% while improving out-of-distribution utility by 47-50%, suggesting differential privacy acts as implicit regularization
🛡️ Threat Analysis
Primary focus is measuring and defending against membership inference attacks (MIA) on LLMs. The paper uses MIA success rates as the empirical privacy gauge and evaluates DP-SGD's effectiveness at reducing MIA attacker advantage by 71.5%.