tool 2025

EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues

Angela Yifei Yuan , Haoyi Li , Soyeon Caren Han , Christopher Leckie

0 citations

α

Published on arXiv

2508.18715

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

EMMM explanations are preferred by 70% of human evaluators over attribution baselines while achieving competitive detection accuracy and sub-1-second latency in real-time dialogue settings.

EMMM

Novel technique introduced


The rapid adoption of large language models (LLMs) in customer service introduces new risks, as malicious actors can exploit them to conduct large-scale user impersonation through machine-generated text (MGT). Current MGT detection methods often struggle in online conversational settings, reducing the reliability and interpretability essential for trustworthy AI deployment. In customer service scenarios where operators are typically non-expert users, explanation become crucial for trustworthy MGT detection. In this paper, we propose EMMM, an explanation-then-detection framework that balances latency, accuracy, and non-expert-oriented interpretability. Experimental results demonstrate that EMMM provides explanations accessible to non-expert users, with 70\% of human evaluators preferring its outputs, while achieving competitive accuracy compared to state-of-the-art models and maintaining low latency, generating outputs within 1 second. Our code and dataset are open-sourced at https://github.com/AngieYYF/EMMM-explainable-chatbot-detection.


Key Contributions

  • EMMM: a dialogue-aware MGT detection framework using multi-level (turn + dialogue), multi-dimension (behavior + language), and multi-strategy (local NL explanations + semi-global visuals) detection grounded in speech act theory
  • Non-expert-oriented explanation reports combining highlighted features, natural language reasoning, and semi-global visualizations — preferred by 70% of human evaluators over attribution baselines
  • Online detection operating in under 1 second via a sequential selector–predictor pipeline with offline preprocessing, plus an open-sourced dataset for conversational MGT detection

🛡️ Threat Analysis

Output Integrity Attack

Paper directly addresses AI-generated content detection — specifically detecting LLM-generated text in conversational settings to counter user impersonation. The core contribution is a novel detection architecture (multi-level turn/dialogue detection, speech-act-theory integration) and explanation system, qualifying as a novel MGT detection framework rather than a mere domain application of existing detectors.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Applications
ai-generated text detectioncustomer service chatbot detectionconversational dialogue monitoring