Untraceable DeepFakes via Traceable Fingerprint Elimination

Recent advancements in DeepFakes attribution technologies have significantly enhanced forensic capabilities, enabling the extraction of traces left by generative models (GMs) in images, making DeepFakes traceable back to their source GMs. Meanwhile, several attacks have attempted to evade attribution models (AMs) for exploring their limitations, calling for more robust AMs. However, existing attacks fail to eliminate GMs' traces, thus can be mitigated by defensive measures. In this paper, we identify that untraceable DeepFakes can be achieved through a multiplicative attack, which can fundamentally eliminate GMs' traces, thereby evading AMs even enhanced with defensive measures. We design a universal and black-box attack method that trains an adversarial model solely using real data, applicable for various GMs and agnostic to AMs. Experimental results demonstrate the outstanding attack capability and universal applicability of our method, achieving an average attack success rate (ASR) of 97.08\% against 6 advanced AMs on DeepFakes generated by 9 GMs. Even in the presence of defensive mechanisms, our method maintains an ASR exceeding 72.39\%. Our work underscores the potential challenges posed by multiplicative attacks and highlights the need for more robust AMs.

Key Contributions

Theoretical identification that multiplicative attacks fundamentally eliminate generative model fingerprints, unlike additive attacks which merely perturb without removing traces
Universal black-box adversarial model trained exclusively on real data, applicable to DeepFakes from any generative model without access to the attribution model
Demonstration of 97.08% average ASR against 6 advanced attribution models across 9 generative models, maintaining 72.39%+ ASR against defensive mechanisms

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks DeepFakes attribution models — forensic systems that trace AI-generated content provenance back to its source generative model. The multiplicative attack removes model fingerprints (implicit content provenance signals) embedded in generative outputs, making AI-generated content untraceable. Per ML09 guidelines, attacks that defeat image-level protections or attribution/provenance systems (including deepfake detection and fingerprint elimination) are output integrity attacks, not ML01 adversarial examples.

Details

Domains

visiongenerative

Model Types

gandiffusioncnn

Threat Tags

black_boxdigitalinference_time

Applications

2026 0 cit.

Output Integrity Attack

86%