benchmark 2026

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

0 citations

Published on arXiv

2604.03819

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Introduces first benchmark for activity-level video forgery localization with comprehensive evaluation protocols across intra-domain, cross-domain, and open-world settings

TADiff (Temporal Artifact Diffuser)

Novel technique introduced

Temporal forgery localization aims to temporally identify manipulated segments in videos. Most existing benchmarks focus on appearance-level forgeries, such as face swapping and object removal. However, recent advances in video generation have driven the emergence of activity-level forgeries that modify human actions to distort event semantics, resulting in highly deceptive forgeries that critically undermine media authenticity and public trust. To overcome this issue, we introduce ActivityForensics, the first large-scale benchmark for localizing manipulated activity in videos. It contains over 6K forged video segments that are seamlessly blended into the video context, rendering high visual consistency that makes them almost indistinguishable from authentic content to the human eye. We further propose Temporal Artifact Diffuser (TADiff), a simple yet effective baseline that exposes artifact cues through a diffusion-based feature regularizer. Based on ActivityForensics, we introduce comprehensive evaluation protocols covering intra-domain, cross-domain, and open-world settings, and benchmark a wide range of state-of-the-art forgery localizers to facilitate future research. The dataset and code are available at https://activityforensics.github.io.

Key Contributions

ActivityForensics: first large-scale benchmark dataset with 6K+ manipulated activity video segments for temporal forgery localization
Grounding-assisted data construction pipeline using video captioning, LLMs, and video generation models to automatically create seamless activity-level forgeries
TADiff baseline detector using diffusion-based feature regularization to expose temporal artifacts in manipulated video segments

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses detecting AI-generated video content (activity-level forgeries) and localizing manipulated segments in videos — this is AI-generated content detection and output integrity verification, the core of ML09.

Details

Domains

visionmultimodal

Model Types

diffusiontransformer

Threat Tags

inference_time

Datasets

ActivityForensics

Applications

video forensicsdeepfake detectionmedia authenticity verification

Read PDF arXiv Code

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models

Human-AI Ensembles Improve Deepfake Detection in Low-to-Medium Quality Videos

The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds

Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences

ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity

Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models