benchmark 2026

AI-Generated Image Detectors Overrely on Global Artifacts: Evidence from Inpainting Exchange

Elif Nebioglu 1, Emirhan Bilgiç 2,3, Adrian Popescu 4

0 citations · 59 references · arXiv

α

Published on arXiv

2602.00192

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Removing global VAE artifacts via INP-X causes state-of-the-art detectors—including commercial APIs (HiveModeration, Sightengine)—to drop from >91% accuracy to ~55%, approaching random chance, demonstrating that detectors exploit spectral shortcuts rather than local synthesized content.

INP-X (Inpainting Exchange)

Novel technique introduced


Modern deep learning-based inpainting enables realistic local image manipulation, raising critical challenges for reliable detection. However, we observe that current detectors primarily rely on global artifacts that appear as inpainting side effects, rather than on locally synthesized content. We show that this behavior occurs because VAE-based reconstruction induces a subtle but pervasive spectral shift across the entire image, including unedited regions. To isolate this effect, we introduce Inpainting Exchange (INP-X), an operation that restores original pixels outside the edited region while preserving all synthesized content. We create a 90K test dataset including real, inpainted, and exchanged images to evaluate this phenomenon. Under this intervention, pretrained state-of-the-art detectors, including commercial ones, exhibit a dramatic drop in accuracy (e.g., from 91\% to 55\%), frequently approaching chance level. We provide a theoretical analysis linking this behavior to high-frequency attenuation caused by VAE information bottlenecks. Our findings highlight the need for content-aware detection. Indeed, training on our dataset yields better generalization and localization than standard inpainting. Our dataset and code are publicly available at https://github.com/emirhanbilgic/INP-X.


Key Contributions

  • INP-X (Inpainting Exchange) operation that surgically restores original pixels outside the edited region to isolate synthesized content and reveal detector over-reliance on global VAE-induced spectral artifacts
  • 90K-image benchmark dataset with real/inpainted/exchanged triplets spanning 4 datasets and 3 inpainting models, used to evaluate 11 pretrained detectors and 2 commercial APIs
  • Theoretical analysis linking high-frequency attenuation from VAE information bottlenecks to the global spectral shift that detectors exploit as a shortcut, plus evidence that training on INP-X images improves cross-distribution generalization and localization

🛡️ Threat Analysis

Output Integrity Attack

The paper directly targets AI-generated content detection integrity: it reveals that state-of-the-art inpainting detectors exploit global spectral artifacts (VAE fingerprints) rather than locally synthesized content, uses INP-X to defeat these detectors, and proposes improved training methodology for robust content-authenticity detection.


Details

Domains
visiongenerative
Model Types
diffusioncnntransformer
Threat Tags
black_boxinference_timedigital
Datasets
Semi-TruthsINP-X (authors' 90K benchmark)
Applications
ai-generated image detectioninpainting detectioncontent authenticity verification