A Comprehensive Dataset for Human vs. AI Generated Image Detection
Rajarshi Roy, Nasrin Imanpour, Ashhar Aziz et al. · Kalyani Government Engineering College · AI Institute USC +12 more
Rajarshi Roy, Nasrin Imanpour, Ashhar Aziz et al. · Kalyani Government Engineering College · AI Institute USC +12 more
Releases MS COCOAI, a 96K-image benchmark for detecting AI-generated images and attributing them to specific generative models
Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and manipulated media. As generated images become harder to distinguish from photographs, detecting them has become an urgent priority. To combat this challenge, We release MS COCOAI, a novel dataset for AI generated image detection consisting of 96000 real and synthetic datapoints, built using the MS COCO dataset. To generate synthetic images, we use five generators: Stable Diffusion 3, Stable Diffusion 2.1, SDXL, DALL-E 3, and MidJourney v6. Based on the dataset, we propose two tasks: (1) classifying images as real or generated, and (2) identifying which model produced a given synthetic image. The dataset is available at https://huggingface.co/datasets/Rajarshi-Roy-research/Defactify_Image_Dataset.
Sai Teja Lekkala, Yadagiri Annepaka, Arun Kumar Challa et al. · NIT Silchar
Benchmark dataset and multi-task evaluation framework for AI-generated text detection covering classification, attribution, adversarial robustness, and mixed-authorship segmentation
Large Language Models (LLMs) are gearing up to surpass human creativity. The veracity of the statement needs careful consideration. In recent developments, critical questions arise regarding the authenticity of human work and the preservation of their creativity and innovative abilities. This paper investigates such issues. This paper addresses machine-generated text detection across several scenarios, including document-level binary and multiclass classification or generator attribution, sentence-level segmentation to differentiate between human-AI collaborative text, and adversarial attacks aimed at reducing the detectability of machine-generated text. We introduce a new work called BMAS English: an English language dataset for binary classification of human and machine text, for multiclass classification, which not only identifies machine-generated text but can also try to determine its generator, and Adversarial attack addressing where it is a common act for the mitigation of detection, and Sentence-level segmentation, for predicting the boundaries between human and machine-generated text. We believe that this paper will address previous work in Machine-Generated Text Detection (MGTD) in a more meaningful way.