defense 2025

PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis

Soumyya Kanti Datta , Tanvi Ranga , Chengzhe Sun , Siwei Lyu

2 citations · 57 references · ICCVW

α

Published on arXiv

2510.14241

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Multimodal fusion of phoneme-temporal and identity-dynamic cues significantly improves detection of subtle deepfake alterations over unimodal and manually designed threshold-based baselines.

PIA (Phoneme-Temporal and Identity-Dynamic Analysis)

Novel technique introduced


The rise of manipulated media has made deepfakes a particularly insidious threat, involving various generative manipulations such as lip-sync modifications, face-swaps, and avatar-driven facial synthesis. Conventional detection methods, which predominantly depend on manually designed phoneme-viseme alignment thresholds, fundamental frame-level consistency checks, or a unimodal detection strategy, inadequately identify modern-day deepfakes generated by advanced generative models such as GANs, diffusion models, and neural rendering techniques. These advanced techniques generate nearly perfect individual frames yet inadvertently create minor temporal discrepancies frequently overlooked by traditional detectors. We present a novel multimodal audio-visual framework, Phoneme-Temporal and Identity-Dynamic Analysis(PIA), incorporating language, dynamic face motion, and facial identification cues to address these limitations. We utilize phoneme sequences, lip geometry data, and advanced facial identity embeddings. This integrated method significantly improves the detection of subtle deepfake alterations by identifying inconsistencies across multiple complementary modalities. Code is available at https://github.com/skrantidatta/PIA


Key Contributions

  • Novel multimodal audio-visual detection framework (PIA) combining phoneme sequences, lip geometry, and facial identity embeddings
  • Addresses limitations of unimodal and manually-threshold-based detectors against advanced generative models (GANs, diffusion, neural rendering)
  • Detects subtle temporal discrepancies across complementary modalities missed by conventional frame-level methods

🛡️ Threat Analysis

Output Integrity Attack

PIA is a novel deepfake detection framework targeting AI-generated/manipulated media (lip-sync, face-swap, avatar synthesis via GANs, diffusion models, neural rendering) — directly addressing output integrity and AI-generated content detection.


Details

Domains
visionaudiomultimodalnlp
Model Types
multimodalgandiffusion
Threat Tags
inference_time
Applications
deepfake detectionlip-sync manipulation detectionface-swap detectionavatar-driven facial synthesis detection