defense 2026

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai 1, Zitong Yu 1, Jun Wang 1, Linlin Shen 2, Yong Xu 3, Xiaochun Cao 4

0 citations

α

Published on arXiv

2603.12208

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

At 10% token retention, ForensicZip achieves 2.97× speedup and over 90% FLOPs reduction while maintaining state-of-the-art performance on deepfake and AIGC detection benchmarks.

ForensicZip

Novel technique introduced


Multimodal Large Language Models (MLLMs) enable interpretable multimedia forensics by generating textual rationales for forgery detection. However, processing dense visual sequences incurs high computational costs, particularly for high-resolution images and videos. Visual token pruning is a practical acceleration strategy, yet existing methods are largely semantic-driven, retaining salient objects while discarding background regions where manipulation traces such as high-frequency anomalies and temporal jitters often reside. To address this issue, we introduce ForensicZip, a training-free framework that reformulates token compression from a forgery-driven perspective. ForensicZip models temporal token evolution as a Birth-Death Optimal Transport problem with a slack dummy node, quantifying physical discontinuities indicating transient generative artifacts. The forensic scoring further integrates transport-based novelty with high-frequency priors to separate forensic evidence from semantic content under large-ratio compression. Experiments on deepfake and AIGC benchmarks show that at 10\% token retention, ForensicZip achieves $2.97\times$ speedup and over 90\% FLOPs reduction while maintaining state-of-the-art detection performance.


Key Contributions

  • ForensicZip, a training-free token compression framework that prioritizes forensic evidence over semantic salience using Birth-Death Optimal Transport with a slack dummy node
  • A forensic scoring method integrating transport-based novelty with high-frequency priors to identify manipulation traces in visually non-salient regions under high pruning ratios
  • Achieves 2.97× inference speedup and >90% FLOPs reduction at 10% token retention while maintaining SOTA deepfake/AIGC detection performance

🛡️ Threat Analysis

Output Integrity Attack

ForensicZip is a detection framework for AI-generated/deepfake content. The core contribution is a novel forensic technique — Birth-Death Optimal Transport scoring — that identifies tokens containing manipulation traces (high-frequency anomalies, temporal jitters), directly improving the methodology of AI-generated content detection in forensic MLLMs.


Details

Domains
visionmultimodalnlp
Model Types
vlmllmtransformer
Threat Tags
inference_time
Applications
deepfake detectionai-generated content detectionmultimedia forensics