Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning

The proliferation of synthetic facial imagery has intensified the need for robust Open-World DeepFake Attribution (OW-DFA), which aims to attribute both known and unknown forgeries using labeled data for known types and unlabeled data containing a mixture of known and novel types. However, existing OW-DFA methods face two critical limitations: 1) A confidence skew that leads to unreliable pseudo-labels for novel forgeries, resulting in biased training. 2) An unrealistic assumption that the number of unknown forgery types is known *a priori*. To address these challenges, we propose a Confidence-Aware Asymmetric Learning (CAL) framework, which adaptively balances model confidence across known and novel forgery types. CAL mainly consists of two components: Confidence-Aware Consistency Regularization (CCR) and Asymmetric Confidence Reinforcement (ACR). CCR mitigates pseudo-label bias by dynamically scaling sample losses based on normalized confidence, gradually shifting the training focus from high- to low-confidence samples. ACR complements this by separately calibrating confidence for known and novel classes through selective learning on high-confidence samples, guided by their confidence gap. Together, CCR and ACR form a mutually reinforcing loop that significantly improves the model's OW-DFA performance. Moreover, we introduce a Dynamic Prototype Pruning (DPP) strategy that automatically estimates the number of novel forgery types in a coarse-to-fine manner, removing the need for unrealistic prior assumptions and enhancing the scalability of our methods to real-world OW-DFA scenarios. Extensive experiments on the standard OW-DFA benchmark and a newly extended benchmark incorporating advanced manipulations demonstrate that CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution.

Key Contributions

Identifies confidence skew as a critical failure mode in existing OW-DFA methods, where models assign unreliably low confidence to novel forgery types, creating a negative pseudo-label feedback loop
Proposes CAL framework with CCR (dynamically re-weights sample losses by normalized confidence to shift focus from high- to low-confidence novel samples) and ACR (asymmetrically calibrates confidence separately for known vs. novel classes via selective high-confidence learning)
Introduces Dynamic Prototype Pruning (DPP) that automatically estimates the number of novel forgery types via coarse-to-fine prototype merging, eliminating the unrealistic a priori assumption about the count of unknown categories

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection and attribution — proposes novel architecture (CAL with CCR, ACR, DPP components) to detect and attribute synthetic facial images to their specific forgery model type, including unknown novel forgery types in an open-world setting.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

inference_time

Datasets

OW-DFA benchmarkExtended OW-DFA benchmark

Applications

2025 0 cit.

Output Integrity Attack

100%