defense 2025

Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning

Haiyang Zheng 1,2, Nan Pu 1,2, Wenjing Li , Teng Long 1,2, Nicu Sebe 1, Zhun Zhong

1 citations · 109 references · arXiv

α

Published on arXiv

2512.12667

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

CAL achieves new state-of-the-art on both known and novel forgery attribution, significantly reducing the performance gap between known and novel types versus prior OW-DFA methods like CPL

CAL (Confidence-Aware Asymmetric Learning)

Novel technique introduced


The proliferation of synthetic facial imagery has intensified the need for robust Open-World DeepFake Attribution (OW-DFA), which aims to attribute both known and unknown forgeries using labeled data for known types and unlabeled data containing a mixture of known and novel types. However, existing OW-DFA methods face two critical limitations: 1) A confidence skew that leads to unreliable pseudo-labels for novel forgeries, resulting in biased training. 2) An unrealistic assumption that the number of unknown forgery types is known *a priori*. To address these challenges, we propose a Confidence-Aware Asymmetric Learning (CAL) framework, which adaptively balances model confidence across known and novel forgery types. CAL mainly consists of two components: Confidence-Aware Consistency Regularization (CCR) and Asymmetric Confidence Reinforcement (ACR). CCR mitigates pseudo-label bias by dynamically scaling sample losses based on normalized confidence, gradually shifting the training focus from high- to low-confidence samples. ACR complements this by separately calibrating confidence for known and novel classes through selective learning on high-confidence samples, guided by their confidence gap. Together, CCR and ACR form a mutually reinforcing loop that significantly improves the model's OW-DFA performance. Moreover, we introduce a Dynamic Prototype Pruning (DPP) strategy that automatically estimates the number of novel forgery types in a coarse-to-fine manner, removing the need for unrealistic prior assumptions and enhancing the scalability of our methods to real-world OW-DFA scenarios. Extensive experiments on the standard OW-DFA benchmark and a newly extended benchmark incorporating advanced manipulations demonstrate that CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution.


Key Contributions

  • Identifies confidence skew as a critical failure mode in existing OW-DFA methods, where models assign unreliably low confidence to novel forgery types, creating a negative pseudo-label feedback loop
  • Proposes CAL framework with CCR (dynamically re-weights sample losses by normalized confidence to shift focus from high- to low-confidence novel samples) and ACR (asymmetrically calibrates confidence separately for known vs. novel classes via selective high-confidence learning)
  • Introduces Dynamic Prototype Pruning (DPP) that automatically estimates the number of novel forgery types via coarse-to-fine prototype merging, eliminating the unrealistic a priori assumption about the count of unknown categories

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection and attribution — proposes novel architecture (CAL with CCR, ACR, DPP components) to detect and attribute synthetic facial images to their specific forgery model type, including unknown novel forgery types in an open-world setting.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
inference_time
Datasets
OW-DFA benchmarkExtended OW-DFA benchmark
Applications
deepfake detectionsynthetic face attributionforensic provenance