Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection

The rapid development of generative AI has made AI-generated images increasingly realistic and high-resolution. Most AI-generated image detection architectures typically downsample images before inputting them into models, risking the loss of fine-grained details. This paper presents GLASS (Global-Local Attention with Stratified Sampling), an architecture that combines a globally resized view with multiple randomly sampled local crops. These crops are original-resolution regions efficiently selected through spatially stratified sampling and aggregated using attention-based scoring. GLASS can be integrated into vision models to leverage both global and local information in images of any size. Vision Transformer, ResNet, and ConvNeXt models are used as backbones, and experiments show that GLASS outperforms standard transfer learning by achieving higher predictive performance within feasible computational constraints.

Key Contributions

GLASS architecture: a two-stream global+local design using spatially stratified random crop sampling at original resolution to preserve fine-grained details lost by standard downsampling
Attention-based aggregation mechanism (additive attention scoring) that weights local crops by informativeness before combining with the global feature stream
Comprehensive evaluation of GLASS across three backbone families (ViT, ResNet, ConvNeXt) showing consistent improvement over standard transfer learning baselines

🛡️ Threat Analysis

Output Integrity Attack

Directly contributes a novel AI-generated image detection architecture (GLASS). The paper's entire contribution is authenticating whether images are AI-generated, which falls squarely under output integrity and synthetic content detection in ML09. This is a novel detection architecture, not a domain application of an existing detector.

Details

Domains

visiongenerative

Model Types

cnntransformer

Threat Tags

inference_timedigital

Applications

2025 2 cit.

Output Integrity Attack

92%