benchmark arXiv Feb 9, 2026 · 8w ago
Janek Bevendorff, Maik Fröbe, André Greiner-Petter et al. · Bauhaus-Universität Weimar · Friedrich Schiller University Jena +8 more
Benchmark workshop organizing five shared tasks for AI-text detection, watermarking robustness, and LLM reasoning safety evaluation
Output Integrity Attack Prompt Injection nlpgenerative
The goal of the PAN workshop is to advance computational stylometry and text forensics via objective and reproducible evaluation. In 2026, we run the following five tasks: (1) Voight-Kampff Generative AI Detection, particularly in mixed and obfuscated authorship scenarios, (2) Text Watermarking, a new task that aims to find new and benchmark the robustness of existing text watermarking schemes, (3) Multi-author Writing Style Analysis, a continued task that aims to find positions of authorship change, (4) Generative Plagiarism Detection, a continued task that targets source retrieval and text alignment between generated text and source documents, and (5) Reasoning Trajectory Detection, a new task that deals with source detection and safety detection of LLM-generated or human-written reasoning trajectories. As in previous years, PAN invites software submissions as easy-to-reproduce Docker containers for most of the tasks. Since PAN 2012, more than 1,100 submissions have been made this way via the TIRA experimentation platform.
llm transformer Bauhaus-Universität Weimar · Friedrich Schiller University Jena · Georg-August-Universität +7 more