EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues
Angela Yifei Yuan , Haoyi Li , Soyeon Caren Han , Christopher Leckie
Published on arXiv
2508.18715
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
EMMM explanations are preferred by 70% of human evaluators over attribution baselines while achieving competitive detection accuracy and sub-1-second latency in real-time dialogue settings.
EMMM
Novel technique introduced
The rapid adoption of large language models (LLMs) in customer service introduces new risks, as malicious actors can exploit them to conduct large-scale user impersonation through machine-generated text (MGT). Current MGT detection methods often struggle in online conversational settings, reducing the reliability and interpretability essential for trustworthy AI deployment. In customer service scenarios where operators are typically non-expert users, explanation become crucial for trustworthy MGT detection. In this paper, we propose EMMM, an explanation-then-detection framework that balances latency, accuracy, and non-expert-oriented interpretability. Experimental results demonstrate that EMMM provides explanations accessible to non-expert users, with 70\% of human evaluators preferring its outputs, while achieving competitive accuracy compared to state-of-the-art models and maintaining low latency, generating outputs within 1 second. Our code and dataset are open-sourced at https://github.com/AngieYYF/EMMM-explainable-chatbot-detection.
Key Contributions
- EMMM: a dialogue-aware MGT detection framework using multi-level (turn + dialogue), multi-dimension (behavior + language), and multi-strategy (local NL explanations + semi-global visuals) detection grounded in speech act theory
- Non-expert-oriented explanation reports combining highlighted features, natural language reasoning, and semi-global visualizations — preferred by 70% of human evaluators over attribution baselines
- Online detection operating in under 1 second via a sequential selector–predictor pipeline with offline preprocessing, plus an open-sourced dataset for conversational MGT detection
🛡️ Threat Analysis
Paper directly addresses AI-generated content detection — specifically detecting LLM-generated text in conversational settings to counter user impersonation. The core contribution is a novel detection architecture (multi-level turn/dialogue detection, speech-act-theory integration) and explanation system, qualifying as a novel MGT detection framework rather than a mere domain application of existing detectors.