ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Multimodal Large Language Models (MLLMs) enable interpretable multimedia forensics by generating textual rationales for forgery detection. However, processing dense visual sequences incurs high computational costs, particularly for high-resolution images and videos. Visual token pruning is a practical acceleration strategy, yet existing methods are largely semantic-driven, retaining salient objects while discarding background regions where manipulation traces such as high-frequency anomalies and temporal jitters often reside. To address this issue, we introduce ForensicZip, a training-free framework that reformulates token compression from a forgery-driven perspective. ForensicZip models temporal token evolution as a Birth-Death Optimal Transport problem with a slack dummy node, quantifying physical discontinuities indicating transient generative artifacts. The forensic scoring further integrates transport-based novelty with high-frequency priors to separate forensic evidence from semantic content under large-ratio compression. Experiments on deepfake and AIGC benchmarks show that at 10\% token retention, ForensicZip achieves $2.97\times$ speedup and over 90\% FLOPs reduction while maintaining state-of-the-art detection performance.

Key Contributions

ForensicZip, a training-free token compression framework that prioritizes forensic evidence over semantic salience using Birth-Death Optimal Transport with a slack dummy node
A forensic scoring method integrating transport-based novelty with high-frequency priors to identify manipulation traces in visually non-salient regions under high pruning ratios
Achieves 2.97× inference speedup and >90% FLOPs reduction at 10% token retention while maintaining SOTA deepfake/AIGC detection performance

🛡️ Threat Analysis

Output Integrity Attack

ForensicZip is a detection framework for AI-generated/deepfake content. The core contribution is a novel forensic technique — Birth-Death Optimal Transport scoring — that identifies tokens containing manipulation traces (high-frequency anomalies, temporal jitters), directly improving the methodology of AI-generated content detection in forensic MLLMs.

Details

Domains

visionmultimodalnlp

Model Types

vlmllmtransformer

Threat Tags

inference_time

Applications

2026 0 cit.

Output Integrity Attack

86%