tool 2025

SAGA: Source Attribution of Generative AI Videos

Rohit Kundu 1,2, Vishal Mohanty 1, Hao Xiong 3, Shan Jia 1, Athula Balachandran 1, Amit K. Roy-Chowdhury 2

0 citations · arXiv

α

Published on arXiv

2511.12834

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SAGA matches fully supervised attribution performance using only 0.5% of source-labeled data per class across five attribution granularities, setting new state-of-the-art in synthetic video provenance.

SAGA (Temporal Attention Signatures / T-Sigs)

Novel technique introduced


The proliferation of generative AI has led to hyper-realistic synthetic videos, escalating misuse risks and outstripping binary real/fake detectors. We introduce SAGA (Source Attribution of Generative AI videos), the first comprehensive framework to address the urgent need for AI-generated video source attribution at a large scale. Unlike traditional detection, SAGA identifies the specific generative model used. It uniquely provides multi-granular attribution across five levels: authenticity, generation task (e.g., T2V/I2V), model version, development team, and the precise generator, offering far richer forensic insights. Our novel video transformer architecture, leveraging features from a robust vision foundation model, effectively captures spatio-temporal artifacts. Critically, we introduce a data-efficient pretrain-and-attribute strategy, enabling SAGA to achieve state-of-the-art attribution using only 0.5\% of source-labeled data per class, matching fully supervised performance. Furthermore, we propose Temporal Attention Signatures (T-Sigs), a novel interpretability method that visualizes learned temporal differences, offering the first explanation for why different video generators are distinguishable. Extensive experiments on public datasets, including cross-domain scenarios, demonstrate that SAGA sets a new benchmark for synthetic video provenance, providing crucial, interpretable insights for forensic and regulatory applications.


Key Contributions

  • Multi-granular video source attribution across five levels (authenticity, generation task, model version, development team, precise generator) using a video transformer over vision foundation model features
  • Data-efficient pretrain-and-attribute strategy with Hard Negative Mining objective that matches fully supervised performance using only 0.5% of source-labeled data per class
  • Temporal Attention Signatures (T-Sigs): first interpretability method visualizing learned spatio-temporal fingerprints distinguishing different AI video generators, including unseen ones

🛡️ Threat Analysis

Output Integrity Attack

SAGA directly addresses AI-generated content detection and provenance — it goes beyond binary real/fake detection to attribute synthetic videos to their specific generative model, development team, and version. Introduces novel forensic techniques (Temporal Attention Signatures, pretrain-and-attribute strategy) for verifying and tracing model output provenance, which is the core of ML09.


Details

Domains
visiongenerative
Model Types
transformerdiffusion
Threat Tags
digitalinference_time
Applications
ai-generated video forensicssynthetic video provenanceregulatory content attribution