defense 2026

IncreFA: Breaking the Static Wall of Generative Model Attribution

Haotian Qin 1, Dongliang Chang 1, Yueying Gao 1, Lei Chen 2, Zhanyu Ma 1

0 citations

α

Published on arXiv

2604.17736

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves state-of-the-art attribution accuracy and 98.93% unseen detection rate on IABench under temporally ordered open-set protocol covering 28 generative models

IncreFA

Novel technique introduced


As AI generative models evolve at unprecedented speed, image attribution has become a moving target. New diffusion, adversarial and autoregressive generators appear almost monthly, making existing watermark, classifier and inversion methods obsolete upon release. The core problem lies not in model recognition, but in the inability to adapt attribution itself. We introduce IncreFA, a framework that redefines attribution as a structured incremental learning problem, allowing the system to learn continuously as new generative models emerge. IncreFA departs from conventional incremental learning by exploiting the hierarchical relationships among generative architectures and coupling them with continual adaptation. It integrates two mutually reinforcing mechanisms: (1) Hierarchical Constraints, which encode architectural hierarchies through learnable orthogonal priors to disentangle family-level invariants from model-specific idiosyncrasies; and (2) a Latent Memory Bank, which replays compact latent exemplars and mixes them to generate pseudo-unseen samples, stabilising representation drift and enhancing open-set awareness. On the newly constructed Incremental Attribution Benchmark (IABench) covering 28 generative models released between 2022 and 2025, IncreFA achieves state-of-the-art attribution accuracy and 98.93% unseen detection under a temporally ordered open-set protocol. Code will be available at https://github.com/Ant0ny44/IncreFA.


Key Contributions

  • Hierarchical Constraints mechanism that encodes architectural hierarchies through learnable orthogonal priors to disentangle family-level invariants from model-specific features
  • Latent Memory Bank with feature mixing to generate pseudo-unseen samples for open-set detection and stabilize representation drift
  • IABench benchmark covering 28 generative models (2022-2025) with temporally ordered open-set evaluation protocol

🛡️ Threat Analysis

Output Integrity Attack

Core contribution is attributing AI-generated images to their source generative models (diffusion, GAN, autoregressive) to trace content provenance. This is output integrity and content authenticity, not model theft detection. The paper addresses the challenge of maintaining attribution accuracy as new generative models emerge, which is a content provenance problem.


Details

Domains
visiongenerative
Model Types
diffusiongantransformer
Threat Tags
inference_time
Datasets
IABench
Applications
image attributioncontent provenancegenerative model identification