benchmark 2026

AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures

Sifatullah Sheikh Urmi , Kirtonia Nuzath Tabassum Arthi , Md Al-Imran

1 citations · 15 references · IBDAP

α

Published on arXiv

2601.01281

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

VFDNET achieved superior deepfake detection accuracy among four evaluated architectures, with MobileNetV3 offering the best efficiency trade-off.

VFDNET

Novel technique introduced


The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision Transformer, were evaluated using large face image datasets. Data preprocessing and augmentation techniques improved model performance across different scenarios. VFDNET demonstrated superior accuracy with MobileNetV3, showing efficient performance, thereby demonstrating AI's capabilities for dependable deepfake detection.


Key Contributions

  • Comparative evaluation of DFCNET, MobileNetV3, ResNet50, and VFDNET for binary real/fake face classification
  • Preprocessing and augmentation pipeline (normalization, rotation, scaling, histogram equalization) to improve generalization
  • Empirical finding that VFDNET achieves highest accuracy while MobileNetV3 offers efficient performance

🛡️ Threat Analysis

Output Integrity Attack

Paper evaluates models for detecting AI-generated deepfake face images — directly addressing output integrity and content authenticity, the core concern of ML09.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
inference_time
Datasets
140K Real and Fake Faces (Kaggle)
Applications
deepfake detectionfacial image authenticity verification