AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures
Sifatullah Sheikh Urmi , Kirtonia Nuzath Tabassum Arthi , Md Al-Imran
Published on arXiv
2601.01281
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
VFDNET achieved superior deepfake detection accuracy among four evaluated architectures, with MobileNetV3 offering the best efficiency trade-off.
VFDNET
Novel technique introduced
The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision Transformer, were evaluated using large face image datasets. Data preprocessing and augmentation techniques improved model performance across different scenarios. VFDNET demonstrated superior accuracy with MobileNetV3, showing efficient performance, thereby demonstrating AI's capabilities for dependable deepfake detection.
Key Contributions
- Comparative evaluation of DFCNET, MobileNetV3, ResNet50, and VFDNET for binary real/fake face classification
- Preprocessing and augmentation pipeline (normalization, rotation, scaling, histogram equalization) to improve generalization
- Empirical finding that VFDNET achieves highest accuracy while MobileNetV3 offers efficient performance
🛡️ Threat Analysis
Paper evaluates models for detecting AI-generated deepfake face images — directly addressing output integrity and content authenticity, the core concern of ML09.