Trace Is In Sentences: Unbiased Lightweight ChatGPT-Generated Text Detector

The widespread adoption of ChatGPT has raised concerns about its misuse, highlighting the need for robust detection of AI-generated text. Current word-level detectors are vulnerable to paraphrasing or simple prompts (PSP), suffer from biases induced by ChatGPT's word-level patterns (CWP) and training data content, degrade on modified text, and often require large models or online LLM interaction. To tackle these issues, we introduce a novel task to detect both original and PSP-modified AI-generated texts, and propose a lightweight framework that classifies texts based on their internal structure, which remains invariant under word-level changes. Our approach encodes sentence embeddings from pre-trained language models and models their relationships via attention. We employ contrastive learning to mitigate embedding biases from autoregressive generation and incorporate a causal graph with counterfactual methods to isolate structural features from topic-related biases. Experiments on two curated datasets, including abstract comparisons and revised life FAQs, validate the effectiveness of our method.

Key Contributions

Identifies and formalizes word-level pattern (CWP) bias in ChatGPT text detectors via causal graph analysis, explaining vulnerability to paraphrase/simple-prompt (PSP) attacks
Proposes a lightweight detection framework encoding inter-sentence structural relations with contrastive learning and counterfactual causal methods to decouple structure from word-level and topic biases
Constructs and releases a large-scale multilingual benchmark (263,595 English + 76,503 Chinese samples) with PSP variants including cyclic translation, synonym substitution, and diverse prompts

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated text detection method — classifying whether text was produced by ChatGPT. This is squarely output integrity/content authenticity. The paper contributes new detection architecture (inter-sentence attention, contrastive learning, causal counterfactual debiasing) rather than applying existing methods to a new domain.

Details

Domains

nlp

Model Types

transformerllm

Threat Tags

inference_timeblack_box

Datasets

HC3Arxiv abstracts (custom)life FAQ dataset (custom)

Applications

2026 0 cit.

Output Integrity Attack

100%

Trace Is In Sentences: Unbiased Lightweight ChatGPT-Generated Text Detector

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Every Language Model Has a Forgery-Resistant Signature

SimKey: A Semantically Aware Key Module for Watermarking Language Models

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Variation is the Key: A Variation-Based Framework for LLM-Generated Text Detection