benchmark 2025

Machine Text Detectors are Membership Inference Attacks

Ryuto Koike ^1,2, Liam Dugan ², Masahiro Kaneko ³, Chris Callison-Burch ², Naoaki Okazaki ¹

¹ Institute of Science Tokyo

² University of Pennsylvania

³ Mohamed bin Zayed University of Artificial Intelligence

1 citations · 1 influential · 46 references · arXiv

Published on arXiv

2510.19492

Membership Inference Attack

OWASP ML Top 10 — ML04

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Machine text detectors and MIA methods share the same optimal likelihood-ratio metric, with cross-task rank correlation of ρ ≈ 0.7, and a text detector (Binoculars) outperforms dedicated MIA methods on membership inference benchmarks.

MINT

Novel technique introduced

Although membership inference attacks (MIAs) and machine-generated text detection target different goals, their methods often exploit similar signals based on a language model's probability distribution, and the two tasks have been studied independently. This can result in conclusions that overlook stronger methods and valuable insights from the other task. In this work, we theoretically and empirically demonstrate the transferability, i.e., how well a method originally developed for one task performs on the other, between MIAs and machine text detection. We prove that the metric achieving asymptotically optimal performance is identical for both tasks. We unify existing methods under this optimal metric and hypothesize that the accuracy with which a method approximates this metric is directly correlated with its transferability. Our large-scale empirical experiments demonstrate very strong rank correlation ($ρ\approx 0.7$) in cross-task performance. Notably, we also find that a machine text detector achieves the strongest performance among evaluated methods on both tasks, demonstrating the practical impact of transferability. To facilitate cross-task development and fair evaluation, we introduce MINT, a unified evaluation suite for MIAs and machine-generated text detection, implementing 15 recent methods from both tasks.

Key Contributions

Theoretical proof that the likelihood ratio test is the asymptotically optimal metric for both MIAs and machine-generated text detection, unifying both tasks under a common framework.
Large-scale empirical demonstration of strong cross-task rank correlation (ρ ≈ 0.7), showing that a state-of-the-art text detector (Binoculars) achieves top performance on MIA benchmarks.
MINT: a unified evaluation suite implementing 15 recent methods from both MIA and machine text detection research, enabling fair cross-task comparison.

🛡️ Threat Analysis

Membership Inference Attack

Membership inference attacks are a primary subject: the paper analyzes methods that determine whether a text sample was in an LLM's training set, proves the optimal MIA metric, and evaluates 15 MIA methods in the unified MINT benchmark.

Output Integrity Attack

Machine-generated text detection is the co-equal subject: the paper analyzes and evaluates detectors distinguishing human vs. AI-generated text, and shows a state-of-the-art detector achieves the strongest performance on MIA tasks, directly linking content authenticity verification to ML04 methods.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timegrey_box

Applications

membership inferenceai-generated text detectionllm training data auditing

Read PDF arXiv DOI Code

Machine Text Detectors are Membership Inference Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

BAID: A Benchmark for Bias Assessment of AI Detectors

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

On the Empirical Power of Goodness-of-Fit Tests in Watermark Detection

LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Mixture of Detectors: A Compact View of Machine-Generated Text Detection

M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text

Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks"