Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression latent and re-synthesizing RGB at the receiver, but this latent can be puppeteered, letting an attacker hijack a victim's likeness in real time. Because every frame is synthetic, deepfake and synthetic video detectors fail outright. To address this security problem, we exploit a key observation: the pose-expression latent inherently contains biometric information of the driving identity. Therefore, we introduce the first biometric leakage defense without ever looking at the reconstructed RGB video: a pose-conditioned, large-margin contrastive encoder that isolates persistent identity cues inside the transmitted latent while cancelling transient pose and expression. A simple cosine test on this disentangled embedding flags illicit identity swaps as the video is rendered. Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios.

Key Contributions

First defense that operates directly on the transmitted pose-expression latent to detect puppeteering attacks — without ever reconstructing the RGB video
Pose-conditioned large-margin contrastive encoder that disentangles persistent biometric identity cues from transient pose and expression within the latent space
Real-time cosine similarity test on disentangled embeddings that flags illicit identity swaps across multiple talking-head generation models with strong out-of-distribution generalization

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses output integrity of AI-synthesized video: an attacker hijacks a victim's likeness by puppeteering the pose-expression latent, producing unauthorized deepfake frames. The proposed defense detects this impersonation by verifying the biometric identity embedded in the latent before RGB reconstruction — a deepfake/AI-generated content authentication problem squarely in the output integrity space.

Details

Domains

visiongenerative

Model Types

generative

Threat Tags

inference_timetargetedblack_box

Applications

2026 0 cit.

Output Integrity Attack

71%