benchmark 2026

AI-Generated Image Detectors Overrely on Global Artifacts: Evidence from Inpainting Exchange

0 citations · 59 references · arXiv

Published on arXiv

2602.00192

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Removing global VAE artifacts via INP-X causes state-of-the-art detectors—including commercial APIs (HiveModeration, Sightengine)—to drop from >91% accuracy to ~55%, approaching random chance, demonstrating that detectors exploit spectral shortcuts rather than local synthesized content.

INP-X (Inpainting Exchange)

Novel technique introduced

Modern deep learning-based inpainting enables realistic local image manipulation, raising critical challenges for reliable detection. However, we observe that current detectors primarily rely on global artifacts that appear as inpainting side effects, rather than on locally synthesized content. We show that this behavior occurs because VAE-based reconstruction induces a subtle but pervasive spectral shift across the entire image, including unedited regions. To isolate this effect, we introduce Inpainting Exchange (INP-X), an operation that restores original pixels outside the edited region while preserving all synthesized content. We create a 90K test dataset including real, inpainted, and exchanged images to evaluate this phenomenon. Under this intervention, pretrained state-of-the-art detectors, including commercial ones, exhibit a dramatic drop in accuracy (e.g., from 91\% to 55\%), frequently approaching chance level. We provide a theoretical analysis linking this behavior to high-frequency attenuation caused by VAE information bottlenecks. Our findings highlight the need for content-aware detection. Indeed, training on our dataset yields better generalization and localization than standard inpainting. Our dataset and code are publicly available at https://github.com/emirhanbilgic/INP-X.

Key Contributions

INP-X (Inpainting Exchange) operation that surgically restores original pixels outside the edited region to isolate synthesized content and reveal detector over-reliance on global VAE-induced spectral artifacts
90K-image benchmark dataset with real/inpainted/exchanged triplets spanning 4 datasets and 3 inpainting models, used to evaluate 11 pretrained detectors and 2 commercial APIs
Theoretical analysis linking high-frequency attenuation from VAE information bottlenecks to the global spectral shift that detectors exploit as a shortcut, plus evidence that training on INP-X images improves cross-distribution generalization and localization

🛡️ Threat Analysis

Output Integrity Attack

The paper directly targets AI-generated content detection integrity: it reveals that state-of-the-art inpainting detectors exploit global spectral artifacts (VAE fingerprints) rather than locally synthesized content, uses INP-X to defeat these detectors, and proposes improved training methodology for robust content-authenticity detection.

Details

Domains

visiongenerative

Model Types

diffusioncnntransformer

Threat Tags

black_boxinference_timedigital

Datasets

Semi-TruthsINP-X (authors' 90K benchmark)

Applications

ai-generated image detectioninpainting detectioncontent authenticity verification

Read PDF arXiv DOI Code

AI-Generated Image Detectors Overrely on Global Artifacts: Evidence from Inpainting Exchange

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Detecting Localized Deepfakes: How Well Do Synthetic Image Detectors Handle Inpainting?

VTONGuard: Automatic Detection and Authentication of AI-Generated Virtual Try-On Content

NeXT-IMDL: Build Benchmark for NeXT-Generation Image Manipulation Detection & Localization

FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection

How well are open sourced AI-generated image detection models out-of-the-box: A comprehensive benchmark study

Além do Desempenho: Um Estudo da Confiabilidade de Detectores de Deepfakes

Deepfake Synthesis vs. Detection: An Uneven Contest

Generalized Design Choices for Deepfake Detectors