PRADA: Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images

Autoregressive (AR) image generation has recently emerged as a powerful paradigm for image synthesis. Leveraging the generation principle of large language models, they allow for efficiently generating deceptively real-looking images, further increasing the need for reliable detection methods. However, to date there is a lack of work specifically targeting the detection of images generated by AR image generators. In this work, we present PRADA (Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images), a simple and interpretable approach that can reliably detect AR-generated images and attribute them to their respective source model. The key idea is to inspect the ratio of a model's conditional and unconditional probability for the autoregressive token sequence representing a given image. Whenever an image is generated by a particular model, its probability ratio shows unique characteristics which are not present for images generated by other models or real images. We exploit these characteristics for threshold-based attribution and detection by calibrating a simple, model-specific score function. Our experimental evaluation shows that PRADA is highly effective against eight class-to-image and four text-to-image models.

Key Contributions

Introduces PRADA, a detection and attribution method exploiting the ratio of conditional to unconditional token probabilities as a model-specific signature for AR-generated images
Demonstrates that each AR image generator leaves unique probability-ratio characteristics absent in real images or images from other models, enabling threshold-based attribution
Evaluates PRADA across 12 AR image generation models (8 class-to-image, 4 text-to-image), showing broad effectiveness

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel detection and attribution method for AI-generated images (from AR generators), directly addressing output integrity and content provenance — a core ML09 concern. The technique identifies whether an image was generated by a specific AR model using probability-ratio signatures unique to each model.

Details

Domains

visiongenerative

Model Types

transformergenerative

Threat Tags

inference_timeblack_box

Applications

2026 0 cit.

Output Integrity Attack

79%