$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D$^3$QE) for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D$^3$QE across different AR models, with robustness to real-world perturbations. Code is available at \href{https://github.com/Zhangyr2022/D3QE}{https://github.com/Zhangyr2022/D3QE}.

Key Contributions

Identifies and exploits discrete distribution discrepancy and codebook frequency bias as discriminative forensic features unique to autoregressive-generated images.
Proposes a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic and quantization error features.
Constructs ARForensics, a benchmark dataset covering 7 mainstream visual AR models for evaluating synthetic image detection.

🛡️ Threat Analysis

Output Integrity Attack

Proposes a novel AI-generated image detection method (D³QE) specifically engineered to detect synthetic images from visual autoregressive models by exploiting unique discrete distribution discrepancies and quantization error patterns in their VQ codebook representations — a forensic technique for output integrity and content authenticity.

Details

Domains

visiongenerative

Model Types

transformer

Threat Tags

inference_timedigital

Datasets

ARForensics

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

All in One: Unifying Deepfake Detection, Tampering Localization, and Source Tracing with a Robust Landmark-Identity Watermark

CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization

End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection

Learning to Watermark in the Latent Space of Generative Models

Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection

Detecting AI-Generated Forgeries via Iterative Manifold Deviation Amplification

Towards Robust Red-Green Watermarking for Autoregressive Image Generators

Towards Transferable Defense Against Malicious Image Edits