$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang , Bingyao Yu , Yu Zheng , Wenzhao Zheng , Yueqi Duan , Lei Chen , Jie Zhou , Jiwen Lu
Published on arXiv
2510.05891
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
D³QE achieves superior detection accuracy and strong generalization across 7 autoregressive visual models with robustness to real-world perturbations.
D³QE (Discrete Distribution Discrepancy-aware Quantization Error)
Novel technique introduced
The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D$^3$QE) for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D$^3$QE across different AR models, with robustness to real-world perturbations. Code is available at \href{https://github.com/Zhangyr2022/D3QE}{https://github.com/Zhangyr2022/D3QE}.
Key Contributions
- Identifies and exploits discrete distribution discrepancy and codebook frequency bias as discriminative forensic features unique to autoregressive-generated images.
- Proposes a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic and quantization error features.
- Constructs ARForensics, a benchmark dataset covering 7 mainstream visual AR models for evaluating synthetic image detection.
🛡️ Threat Analysis
Proposes a novel AI-generated image detection method (D³QE) specifically engineered to detect synthetic images from visual autoregressive models by exploiting unique discrete distribution discrepancies and quantization error patterns in their VQ codebook representations — a forensic technique for output integrity and content authenticity.