tool 2026

SWIFT: Sliding Window Reconstruction for Few-Shot Training-Free Generated Video Attribution

Chao Wang 1, Zijin Yang 1, Yaofei Wang 2, Yuang Qi 1, Weiming Zhang 1, Nenghai Yu 1, Kejiang Chen 1

0 citations

α

Published on arXiv

2603.08536

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves over 90% average source attribution accuracy across five state-of-the-art video generation models using as few as 20 samples, with zero-shot attribution possible for three models.

SWIFT

Novel technique introduced


Recent advancements in video generation technologies have been significant, resulting in their widespread application across multiple domains. However, concerns have been mounting over the potential misuse of generated content. Tracing the origin of generated videos has become crucial to mitigate potential misuse and identify responsible parties. Existing video attribution methods require additional operations or the training of source attribution models, which may degrade video quality or necessitate large amounts of training samples. To address these challenges, we define for the first time the "few-shot training-free generated video attribution" task and propose SWIFT, which is tightly integrated with the temporal characteristics of the video. By leveraging the "Pixel Frames(many) to Latent Frame(one)" temporal mapping within each video chunk, SWIFT applies a fixed-length sliding window to perform two distinct reconstructions: normal and corrupted. The variation in the losses between two reconstructions is then used as an attribution signal. We conducted an extensive evaluation of five state-of-the-art (SOTA) video generation models. Experimental results show that SWIFT achieves over 90% average attribution accuracy with merely 20 video samples across all models and even enables zero-shot attribution for HunyuanVideo, EasyAnimate, and Wan2.2. Our source code is available at https://github.com/wangchao0708/SWIFT.


Key Contributions

  • First formal definition of the 'few-shot training-free generated video attribution' task
  • SWIFT method exploiting the 'Pixel Frames→Latent Frame' temporal mapping within video chunks via a fixed-length sliding window performing normal vs. corrupted reconstructions, using the loss variation as an attribution signal
  • Achieves >90% average attribution accuracy across 5 SOTA video generators using only 20 samples, with zero-shot capability for HunyuanVideo, EasyAnimate, and Wan2.2

🛡️ Threat Analysis

Output Integrity Attack

Addresses AI-generated content provenance and attribution — determining which generative model produced a given video. This is a novel forensic technique for content authenticity and traceability, squarely within ML09's scope of output integrity and AI-generated content detection/attribution.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timeblack_box
Datasets
HunyuanVideoEasyAnimateWan2.2
Applications
video generation attributionai-generated video forensicscontent provenance