EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues

The rapid adoption of large language models (LLMs) in customer service introduces new risks, as malicious actors can exploit them to conduct large-scale user impersonation through machine-generated text (MGT). Current MGT detection methods often struggle in online conversational settings, reducing the reliability and interpretability essential for trustworthy AI deployment. In customer service scenarios where operators are typically non-expert users, explanation become crucial for trustworthy MGT detection. In this paper, we propose EMMM, an explanation-then-detection framework that balances latency, accuracy, and non-expert-oriented interpretability. Experimental results demonstrate that EMMM provides explanations accessible to non-expert users, with 70\% of human evaluators preferring its outputs, while achieving competitive accuracy compared to state-of-the-art models and maintaining low latency, generating outputs within 1 second. Our code and dataset are open-sourced at https://github.com/AngieYYF/EMMM-explainable-chatbot-detection.

Key Contributions

EMMM: a dialogue-aware MGT detection framework using multi-level (turn + dialogue), multi-dimension (behavior + language), and multi-strategy (local NL explanations + semi-global visuals) detection grounded in speech act theory
Non-expert-oriented explanation reports combining highlighted features, natural language reasoning, and semi-global visualizations — preferred by 70% of human evaluators over attribution baselines
Online detection operating in under 1 second via a sequential selector–predictor pipeline with offline preprocessing, plus an open-sourced dataset for conversational MGT detection

🛡️ Threat Analysis

Output Integrity Attack

Paper directly addresses AI-generated content detection — specifically detecting LLM-generated text in conversational settings to counter user impersonation. The core contribution is a novel detection architecture (multi-level turn/dialogue detection, speech-act-theory integration) and explanation system, qualifying as a novel MGT detection framework rather than a mere domain application of existing detectors.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Applications

2025 4 cit.

Output Integrity Attack

90%

EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis

Detecting LLM-Generated Text with Performance Guarantees

Multi-Hierarchical Feature Detection for Large Language Model Generated Text

SOGPTSpotter: Detecting ChatGPT-Generated Answers on Stack Overflow

On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution

Diversity Boosts AI-Generated Text Detection