defense 2025

GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection

Haozhen Yan 1, Yan Hong 2, Suning Lang 1, Jiahui Zhan 1, Yikun Ji 1, Yujie Gao 1, Huijia Zhu 2, Jun Lan 2, Jianfu Zhang 1

0 citations

α

Published on arXiv

2509.10250

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art generalization on the GenImage benchmark, surpassing the prior best by 5.8%, and maintains strong performance on GPT-4o-generated images.

GAMMA

Novel technique introduced


With generative models becoming increasingly sophisticated and diverse, detecting AI-generated images has become increasingly challenging. While existing AI-genereted Image detectors achieve promising performance on in-distribution generated images, their generalization to unseen generative models remains limited. This limitation is largely attributed to their reliance on generation-specific artifacts, such as stylistic priors and compression patterns. To address these limitations, we propose GAMMA, a novel training framework designed to reduce domain bias and enhance semantic alignment. GAMMA introduces diverse manipulation strategies, such as inpainting-based manipulation and semantics-preserving perturbations, to ensure consistency between manipulated and authentic content. We employ multi-task supervision with dual segmentation heads and a classification head, enabling pixel-level source attribution across diverse generative domains. In addition, a reverse cross-attention mechanism is introduced to allow the segmentation heads to guide and correct biased representations in the classification branch. Our method achieves state-of-the-art generalization performance on the GenImage benchmark, imporving accuracy by 5.8%, but also maintains strong robustness on newly released generative model such as GPT-4o.


Key Contributions

  • Manipulation-augmented training using inpainting, copy-move, and splicing to reduce generation-specific bias in AI-generated image detectors
  • Multi-task supervision with dual segmentation heads enabling pixel-level source attribution alongside image-level classification
  • Reverse cross-attention mechanism that allows segmentation branches to correct biased representations in the classification branch

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is a novel AI-generated image detection framework — directly addresses output integrity and content authenticity verification. Proposes new architecture (reverse cross-attention, dual segmentation heads) and training strategy to generalize detection to unseen generative models.


Details

Domains
vision
Model Types
transformerdiffusiongenerative
Threat Tags
inference_time
Datasets
GenImage
Applications
ai-generated image detectionsynthetic image forensics