benchmark 2026

Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs

Ruixuan Liu 1, David Evans 2, Li Xiong 1

0 citations

α

Published on arXiv

2604.18697

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Demonstrates that differential privacy bounds and low membership inference do not prevent data extraction, providing tight extraction risk estimation and actionable mitigation guidelines

(l,b)-inextractability

Novel technique introduced


Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to show a model is sufficiently protected against broader memorization risks. However, we show that indistinguishability properties are neither sufficient nor necessary for preventing data extraction in LLM APIs. We formalize a privacy-game separation between extraction and indistinguishability-based privacy, showing that indistinguishability and inextractability are incomparable: upper-bounding distinguishability does not upper-bound extractability. To address this gap, we introduce $(l, b)$-inextractability as a definition that requires at least $2^b$ expected queries for any black-box adversary to induce the LLM API to emit a protected $l$-gram substring. We instantiate this via a worst-case extraction game and derive a rank-based extraction risk upper bound for targeted exact extraction, as well as extensions to cover untargeted and approximate extraction. The resulting estimator captures the extraction risk over multiple attack trials and prefix adaptations. We show that it can provide a tight and efficient estimation for standard greedy extraction and an upper bound on the probabilistic extraction risk given any decoding configuration. We empirically evaluate extractability across different models, clarifying its connection to distinguishability, demonstrating its advantage over existing extraction risk estimators, and providing actionable mitigation guidelines across model training, API access, and decoding configurations in LLM API deployment. Our code is publicly available at: https://github.com/Emory-AIMS/Inextractability.


Key Contributions

  • Formalized (l,b)-inextractability definition requiring 2^b queries to extract l-gram substrings
  • Proved privacy-game separation: indistinguishability and inextractability are incomparable properties
  • Developed rank-based extraction risk estimator covering targeted/untargeted and exact/approximate extraction

🛡️ Threat Analysis

Model Inversion Attack

Paper addresses training data extraction attacks from LLMs—adversaries recovering protected n-grams from model outputs. Proposes (l,b)-inextractability definition and rank-based estimator to measure extraction risk.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetargeteduntargeted
Applications
llm api deploymenttext generation