Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection

Current AI-Generated Image (AIGI) detection approaches predominantly rely on binary classification to distinguish real from synthetic images, often lacking interpretable or convincing evidence to substantiate their decisions. This limitation stems from existing AIGI detection benchmarks, which, despite featuring a broad collection of synthetic images, remain restricted in their coverage of artifact diversity and lack detailed, localized annotations. To bridge this gap, we introduce a fine-grained benchmark towards eXplainable AI-Generated image Detection, named X-AIGD, which provides pixel-level, categorized annotations of perceptual artifacts, spanning low-level distortions, high-level semantics, and cognitive-level counterfactuals. These comprehensive annotations facilitate fine-grained interpretability evaluation and deeper insight into model decision-making processes. Our extensive investigation using X-AIGD provides several key insights: (1) Existing AIGI detectors demonstrate negligible reliance on perceptual artifacts, even at the most basic distortion level. (2) While AIGI detectors can be trained to identify specific artifacts, they still substantially base their judgment on uninterpretable features. (3) Explicitly aligning model attention with artifact regions can increase the interpretability and generalization of detectors. The data and code are available at: https://github.com/Coxy7/X-AIGD.

Key Contributions

X-AIGD benchmark providing pixel-level, categorized perceptual artifact annotations spanning low-level distortions, high-level semantics, and cognitive-level counterfactuals for interpretable AIGI detection evaluation
Empirical finding that existing AIGI detectors show negligible reliance on human-interpretable perceptual artifacts even at the basic distortion level
Demonstration that explicitly aligning model attention with annotated artifact regions improves both interpretability and cross-dataset generalization

🛡️ Threat Analysis

Output Integrity Attack

Directly targets AI-generated image detection by providing a fine-grained evaluation benchmark with pixel-level artifact annotations across three artifact levels, supporting rigorous measurement of content authenticity and provenance verification systems.

Details

Domains

vision

Model Types

diffusioncnntransformervlm

Threat Tags

inference_time

Datasets

X-AIGD

Applications

2025 0 cit.

Output Integrity Attack

77%