defense 2025

PRADA: Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images

Simon Damm , Jonas Ricker , Henning Petzka , Asja Fischer

0 citations · 86 references · arXiv

α

Published on arXiv

2511.20068

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PRADA reliably detects and attributes AR-generated images across 12 models using a simple calibrated score function based on conditional/unconditional probability ratios.

PRADA

Novel technique introduced


Autoregressive (AR) image generation has recently emerged as a powerful paradigm for image synthesis. Leveraging the generation principle of large language models, they allow for efficiently generating deceptively real-looking images, further increasing the need for reliable detection methods. However, to date there is a lack of work specifically targeting the detection of images generated by AR image generators. In this work, we present PRADA (Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images), a simple and interpretable approach that can reliably detect AR-generated images and attribute them to their respective source model. The key idea is to inspect the ratio of a model's conditional and unconditional probability for the autoregressive token sequence representing a given image. Whenever an image is generated by a particular model, its probability ratio shows unique characteristics which are not present for images generated by other models or real images. We exploit these characteristics for threshold-based attribution and detection by calibrating a simple, model-specific score function. Our experimental evaluation shows that PRADA is highly effective against eight class-to-image and four text-to-image models.


Key Contributions

  • Introduces PRADA, a detection and attribution method exploiting the ratio of conditional to unconditional token probabilities as a model-specific signature for AR-generated images
  • Demonstrates that each AR image generator leaves unique probability-ratio characteristics absent in real images or images from other models, enabling threshold-based attribution
  • Evaluates PRADA across 12 AR image generation models (8 class-to-image, 4 text-to-image), showing broad effectiveness

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel detection and attribution method for AI-generated images (from AR generators), directly addressing output integrity and content provenance — a core ML09 concern. The technique identifies whether an image was generated by a specific AR model using probability-ratio signatures unique to each model.


Details

Domains
visiongenerative
Model Types
transformergenerative
Threat Tags
inference_timeblack_box
Applications
ai-generated image detectionimage source attributionsynthetic image forensics