defense 2025

Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text

Zixin Rao 1, Youssef Mohamed 2, Shang Liu 3, Zeyan Liu 3

0 citations

α

Published on arXiv

2508.14190

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Multi-task learning boosts performance on both detection and authorship attribution simultaneously while generalizing across multiple languages and remaining robust to adversarial obfuscation.

DA-MTL

Novel technique introduced


Large Language Models (LLMs), such as GPT-4 and Llama, have demonstrated remarkable abilities in generating natural language. However, they also pose security and integrity challenges. Existing countermeasures primarily focus on distinguishing AI-generated content from human-written text, with most solutions tailored for English. Meanwhile, authorship attribution--determining which specific LLM produced a given text--has received comparatively little attention despite its importance in forensic analysis. In this paper, we present DA-MTL, a multi-task learning framework that simultaneously addresses both text detection and authorship attribution. We evaluate DA-MTL on nine datasets and four backbone models, demonstrating its strong performance across multiple languages and LLM sources. Our framework captures each task's unique characteristics and shares insights between them, which boosts performance in both tasks. Additionally, we conduct a thorough analysis of cross-modal and cross-lingual patterns and assess the framework's robustness against adversarial obfuscation techniques. Our findings offer valuable insights into LLM behavior and the generalization of both detection and authorship attribution.


Key Contributions

  • DA-MTL: a multi-task learning framework that simultaneously performs LLM-generated text detection and LLM authorship attribution with shared representations
  • Cross-lingual and cross-modal evaluation across nine datasets and four backbone models demonstrating generalization beyond English
  • Robustness analysis of the detection/attribution framework against adversarial obfuscation techniques

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection (distinguishing LLM-generated from human-written text) and authorship attribution (identifying which LLM produced a text) — both are output integrity and content provenance concerns. The paper proposes a novel detection architecture (multi-task learning) rather than merely applying existing methods to a new domain.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_time
Datasets
nine datasets (specific names not available in provided LaTeX body)
Applications
ai-generated text detectionllm authorship attributionforensic analysis