defense 2025

Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

Danial Samadi Vahdati 1, Tai Duc Nguyen 1, Ekta Prashnani 2, Koki Nagano 2, David Luebke 2, Orazio Gallo 2, Matthew Stamm 2

0 citations · 70 references · arXiv

α

Published on arXiv

2510.03548

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Consistently outperforms existing puppeteering defenses across multiple talking-head generation models while achieving real-time operation and strong generalization to out-of-distribution scenarios


AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression latent and re-synthesizing RGB at the receiver, but this latent can be puppeteered, letting an attacker hijack a victim's likeness in real time. Because every frame is synthetic, deepfake and synthetic video detectors fail outright. To address this security problem, we exploit a key observation: the pose-expression latent inherently contains biometric information of the driving identity. Therefore, we introduce the first biometric leakage defense without ever looking at the reconstructed RGB video: a pose-conditioned, large-margin contrastive encoder that isolates persistent identity cues inside the transmitted latent while cancelling transient pose and expression. A simple cosine test on this disentangled embedding flags illicit identity swaps as the video is rendered. Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios.


Key Contributions

  • First defense that operates directly on the transmitted pose-expression latent to detect puppeteering attacks — without ever reconstructing the RGB video
  • Pose-conditioned large-margin contrastive encoder that disentangles persistent biometric identity cues from transient pose and expression within the latent space
  • Real-time cosine similarity test on disentangled embeddings that flags illicit identity swaps across multiple talking-head generation models with strong out-of-distribution generalization

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses output integrity of AI-synthesized video: an attacker hijacks a victim's likeness by puppeteering the pose-expression latent, producing unauthorized deepfake frames. The proposed defense detects this impersonation by verifying the biometric identity embedded in the latent before RGB reconstruction — a deepfake/AI-generated content authentication problem squarely in the output integrity space.


Details

Domains
visiongenerative
Model Types
generative
Threat Tags
inference_timetargetedblack_box
Applications
ai-based videoconferencingtalking-head video synthesisreal-time deepfake impersonation detection