SWIFT: Sliding Window Reconstruction for Few-Shot Training-Free Generated Video Attribution

Recent advancements in video generation technologies have been significant, resulting in their widespread application across multiple domains. However, concerns have been mounting over the potential misuse of generated content. Tracing the origin of generated videos has become crucial to mitigate potential misuse and identify responsible parties. Existing video attribution methods require additional operations or the training of source attribution models, which may degrade video quality or necessitate large amounts of training samples. To address these challenges, we define for the first time the "few-shot training-free generated video attribution" task and propose SWIFT, which is tightly integrated with the temporal characteristics of the video. By leveraging the "Pixel Frames(many) to Latent Frame(one)" temporal mapping within each video chunk, SWIFT applies a fixed-length sliding window to perform two distinct reconstructions: normal and corrupted. The variation in the losses between two reconstructions is then used as an attribution signal. We conducted an extensive evaluation of five state-of-the-art (SOTA) video generation models. Experimental results show that SWIFT achieves over 90% average attribution accuracy with merely 20 video samples across all models and even enables zero-shot attribution for HunyuanVideo, EasyAnimate, and Wan2.2. Our source code is available at https://github.com/wangchao0708/SWIFT.

Key Contributions

First formal definition of the 'few-shot training-free generated video attribution' task
SWIFT method exploiting the 'Pixel Frames→Latent Frame' temporal mapping within video chunks via a fixed-length sliding window performing normal vs. corrupted reconstructions, using the loss variation as an attribution signal
Achieves >90% average attribution accuracy across 5 SOTA video generators using only 20 samples, with zero-shot capability for HunyuanVideo, EasyAnimate, and Wan2.2

🛡️ Threat Analysis

Output Integrity Attack

Addresses AI-generated content provenance and attribution — determining which generative model produced a given video. This is a novel forensic technique for content authenticity and traceability, squarely within ML09's scope of output integrity and AI-generated content detection/attribution.