BAID: A Benchmark for Bias Assessment of AI Detectors

AI-generated text detectors have recently gained adoption in educational and professional contexts. Prior research has uncovered isolated cases of bias, particularly against English Language Learners (ELLs) however, there is a lack of systematic evaluation of such systems across broader sociolinguistic factors. In this work, we propose BAID, a comprehensive evaluation framework for AI detectors across various types of biases. As a part of the framework, we introduce over 200k samples spanning 7 major categories: demographics, age, educational grade level, dialect, formality, political leaning, and topic. We also generated synthetic versions of each sample with carefully crafted prompts to preserve the original content while reflecting subgroup-specific writing styles. Using this, we evaluate four open-source state-of-the-art AI text detectors and find consistent disparities in detection performance, particularly low recall rates for texts from underrepresented groups. Our contributions provide a scalable, transparent approach for auditing AI detectors and emphasize the need for bias-aware evaluation before these tools are deployed for public use.

Key Contributions

BAID evaluation framework with 200k+ samples spanning 7 sociolinguistic bias categories (demographics, age, grade level, dialect, formality, political leaning, topic)
Synthetic counterpart generation using LLM prompting to preserve content while reflecting subgroup-specific writing styles
Systematic evaluation of four open-source AI text detectors revealing consistent performance disparities against underrepresented groups

🛡️ Threat Analysis

Output Integrity Attack

Proposes an evaluation framework (BAID) specifically for AI-generated text detection systems, revealing consistent reliability failures (low recall) for underrepresented demographic groups — directly assessing output integrity of AI content detection systems.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

BAID (200k+ samples, introduced by authors)

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

Mixture of Detectors: A Compact View of Machine-Generated Text Detection

Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks"

Authorship Attribution in Multilingual Machine-Generated Texts

On the Empirical Power of Goodness-of-Fit Tests in Watermark Detection

How Good is Post-Hoc Watermarking With Language Model Rephrasing?

M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text