attack 2025

Untraceable DeepFakes via Traceable Fingerprint Elimination

Jiewei Lai 1, Lan Zhang 1, Chen Tang 1, Pengcheng Sun 1, Xinming Wang 1, Yunhao Wang 2

0 citations

α

Published on arXiv

2508.03067

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves 97.08% average attack success rate against 6 state-of-the-art DeepFakes attribution models on outputs from 9 generative models, with 72.39%+ ASR persisting even against defensive countermeasures.

Multiplicative Fingerprint Elimination Attack

Novel technique introduced


Recent advancements in DeepFakes attribution technologies have significantly enhanced forensic capabilities, enabling the extraction of traces left by generative models (GMs) in images, making DeepFakes traceable back to their source GMs. Meanwhile, several attacks have attempted to evade attribution models (AMs) for exploring their limitations, calling for more robust AMs. However, existing attacks fail to eliminate GMs' traces, thus can be mitigated by defensive measures. In this paper, we identify that untraceable DeepFakes can be achieved through a multiplicative attack, which can fundamentally eliminate GMs' traces, thereby evading AMs even enhanced with defensive measures. We design a universal and black-box attack method that trains an adversarial model solely using real data, applicable for various GMs and agnostic to AMs. Experimental results demonstrate the outstanding attack capability and universal applicability of our method, achieving an average attack success rate (ASR) of 97.08\% against 6 advanced AMs on DeepFakes generated by 9 GMs. Even in the presence of defensive mechanisms, our method maintains an ASR exceeding 72.39\%. Our work underscores the potential challenges posed by multiplicative attacks and highlights the need for more robust AMs.


Key Contributions

  • Theoretical identification that multiplicative attacks fundamentally eliminate generative model fingerprints, unlike additive attacks which merely perturb without removing traces
  • Universal black-box adversarial model trained exclusively on real data, applicable to DeepFakes from any generative model without access to the attribution model
  • Demonstration of 97.08% average ASR against 6 advanced attribution models across 9 generative models, maintaining 72.39%+ ASR against defensive mechanisms

🛡️ Threat Analysis

Output Integrity Attack

The paper attacks DeepFakes attribution models — forensic systems that trace AI-generated content provenance back to its source generative model. The multiplicative attack removes model fingerprints (implicit content provenance signals) embedded in generative outputs, making AI-generated content untraceable. Per ML09 guidelines, attacks that defeat image-level protections or attribution/provenance systems (including deepfake detection and fingerprint elimination) are output integrity attacks, not ML01 adversarial examples.


Details

Domains
visiongenerative
Model Types
gandiffusioncnn
Threat Tags
black_boxdigitalinference_time
Applications
deepfake attributiongenerative model forensicsai-generated image provenance