defense 2026

Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection

Lawrence Han

0 citations · 16 references · arXiv

α

Published on arXiv

2601.00141

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

GLASS outperforms standard transfer learning across ViT, ResNet, and ConvNeXt backbones on AI-generated image detection while remaining computationally feasible by sampling crops rather than exhaustively tiling the full image.

GLASS (Global-Local Attention with Stratified Sampling)

Novel technique introduced


The rapid development of generative AI has made AI-generated images increasingly realistic and high-resolution. Most AI-generated image detection architectures typically downsample images before inputting them into models, risking the loss of fine-grained details. This paper presents GLASS (Global-Local Attention with Stratified Sampling), an architecture that combines a globally resized view with multiple randomly sampled local crops. These crops are original-resolution regions efficiently selected through spatially stratified sampling and aggregated using attention-based scoring. GLASS can be integrated into vision models to leverage both global and local information in images of any size. Vision Transformer, ResNet, and ConvNeXt models are used as backbones, and experiments show that GLASS outperforms standard transfer learning by achieving higher predictive performance within feasible computational constraints.


Key Contributions

  • GLASS architecture: a two-stream global+local design using spatially stratified random crop sampling at original resolution to preserve fine-grained details lost by standard downsampling
  • Attention-based aggregation mechanism (additive attention scoring) that weights local crops by informativeness before combining with the global feature stream
  • Comprehensive evaluation of GLASS across three backbone families (ViT, ResNet, ConvNeXt) showing consistent improvement over standard transfer learning baselines

🛡️ Threat Analysis

Output Integrity Attack

Directly contributes a novel AI-generated image detection architecture (GLASS). The paper's entire contribution is authenticating whether images are AI-generated, which falls squarely under output integrity and synthetic content detection in ML09. This is a novel detection architecture, not a domain application of an existing detector.


Details

Domains
visiongenerative
Model Types
cnntransformer
Threat Tags
inference_timedigital
Applications
ai-generated image detectionsynthetic image forensics