Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

Fairness is a core element in the trustworthy deployment of deepfake detection models, especially in the field of digital identity security. Biases in detection models toward different demographic groups, such as gender and race, may lead to systemic misjudgments, exacerbating the digital divide and social inequities. However, current fairness-enhanced detectors often improve fairness at the cost of detection accuracy. To address this challenge, we propose a dual-mechanism collaborative optimization framework. Our proposed method innovatively integrates structural fairness decoupling and global distribution alignment: decoupling channels sensitive to demographic groups at the model architectural level, and subsequently reducing the distance between the overall sample distribution and the distributions corresponding to each demographic group at the feature level. Experimental results demonstrate that, compared with other methods, our framework improves both inter-group and intra-group fairness while maintaining overall detection accuracy across domains.

Key Contributions

Structural fairness decoupling that separates demographic-sensitive channels at the architectural level to isolate group-specific biases
Global distribution alignment module that reduces feature-space distance between the overall sample distribution and per-demographic-group distributions
Demonstrated improvement in both inter-group and intra-group fairness while preserving cross-domain detection accuracy compared to prior fairness-enhanced detectors

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is a new deepfake detection framework — AI-generated content detection (deepfake detection) is explicitly enumerated under ML09 (Output Integrity Attack). The novel dual-mechanism architecture addresses equitable detection performance across demographic groups while maintaining overall detection accuracy.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

digitalinference_time

Datasets

FaceForensics++CelebDF

Applications

2025 0 cit.

Output Integrity Attack

100%