ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection
Published on arXiv
2511.14554
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
ForensicFlow achieves AUC 0.9752 and F1 0.9408 on Celeb-DF(v2), outperforming single-stream detectors through multi-domain branch fusion.
ForensicFlow
Novel technique introduced
Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated forgeries.
Key Contributions
- Tri-modal forensic architecture combining ConvNeXt-tiny (global visual inconsistencies), Swin Transformer-tiny (fine-grained texture anomalies), and a frequency-domain CNN with channel attention (spectral noise patterns)
- Attention-based temporal pooling that dynamically prioritizes high-evidence frames and adaptive branch fusion weighted by forgery type
- Achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 on Celeb-DF(v2), with Grad-CAM confirming focus on genuine manipulation regions
🛡️ Threat Analysis
Proposes a novel forensic detection architecture for AI-generated/manipulated video (deepfakes), directly addressing output integrity and authenticity of synthetic media — the canonical ML09 use case.