Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning
Haiyang Zheng 1,2, Nan Pu 1,2, Wenjing Li , Teng Long 1,2, Nicu Sebe 1, Zhun Zhong
Published on arXiv
2512.12667
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
CAL achieves new state-of-the-art on both known and novel forgery attribution, significantly reducing the performance gap between known and novel types versus prior OW-DFA methods like CPL
CAL (Confidence-Aware Asymmetric Learning)
Novel technique introduced
The proliferation of synthetic facial imagery has intensified the need for robust Open-World DeepFake Attribution (OW-DFA), which aims to attribute both known and unknown forgeries using labeled data for known types and unlabeled data containing a mixture of known and novel types. However, existing OW-DFA methods face two critical limitations: 1) A confidence skew that leads to unreliable pseudo-labels for novel forgeries, resulting in biased training. 2) An unrealistic assumption that the number of unknown forgery types is known *a priori*. To address these challenges, we propose a Confidence-Aware Asymmetric Learning (CAL) framework, which adaptively balances model confidence across known and novel forgery types. CAL mainly consists of two components: Confidence-Aware Consistency Regularization (CCR) and Asymmetric Confidence Reinforcement (ACR). CCR mitigates pseudo-label bias by dynamically scaling sample losses based on normalized confidence, gradually shifting the training focus from high- to low-confidence samples. ACR complements this by separately calibrating confidence for known and novel classes through selective learning on high-confidence samples, guided by their confidence gap. Together, CCR and ACR form a mutually reinforcing loop that significantly improves the model's OW-DFA performance. Moreover, we introduce a Dynamic Prototype Pruning (DPP) strategy that automatically estimates the number of novel forgery types in a coarse-to-fine manner, removing the need for unrealistic prior assumptions and enhancing the scalability of our methods to real-world OW-DFA scenarios. Extensive experiments on the standard OW-DFA benchmark and a newly extended benchmark incorporating advanced manipulations demonstrate that CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution.
Key Contributions
- Identifies confidence skew as a critical failure mode in existing OW-DFA methods, where models assign unreliably low confidence to novel forgery types, creating a negative pseudo-label feedback loop
- Proposes CAL framework with CCR (dynamically re-weights sample losses by normalized confidence to shift focus from high- to low-confidence novel samples) and ACR (asymmetrically calibrates confidence separately for known vs. novel classes via selective high-confidence learning)
- Introduces Dynamic Prototype Pruning (DPP) that automatically estimates the number of novel forgery types via coarse-to-fine prototype merging, eliminating the unrealistic a priori assumption about the count of unknown categories
🛡️ Threat Analysis
Directly addresses AI-generated content detection and attribution — proposes novel architecture (CAL with CCR, ACR, DPP components) to detect and attribute synthetic facial images to their specific forgery model type, including unknown novel forgery types in an open-world setting.