Causal Fingerprints of AI Generative Models
Hui Xu 1,2, Chi Liu 1,2, Congcong Zhu 1,2, Minghao Wang 1,2, Youyang Qu 3, Longxiang Gao 3
Published on arXiv
2509.15406
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
The proposed causal fingerprinting framework outperforms six representative baseline methods across multiple model attribution categories covering both GANs and diffusion models
Causal Fingerprint (CF) causality-decoupling framework
Novel technique introduced
AI generative models leave implicit traces in their generated images, which are commonly referred to as model fingerprints and are exploited for source attribution. Prior methods rely on model-specific cues or synthesis artifacts, yielding limited fingerprints that may generalize poorly across different generative models. We argue that a complete model fingerprint should reflect the causality between image provenance and model traces, a direction largely unexplored. To this end, we conceptualize the \emph{causal fingerprint} of generative models, and propose a causality-decoupling framework that disentangles it from image-specific content and style in a semantic-invariant latent space derived from pre-trained diffusion reconstruction residual. We further enhance fingerprint granularity with diverse feature representations. We validate causality by assessing attribution performance across representative GANs and diffusion models and by achieving source anonymization using counterfactual examples generated from causal fingerprints. Experiments show our approach outperforms existing methods in model attribution, indicating strong potential for forgery detection, model copyright tracing, and identity protection.
Key Contributions
- First formal definition of causal fingerprints for AI generative models, grounding source attribution in causality between image provenance and model traces
- Causality-decoupling framework that disentangles causal fingerprints from image content and style in a semantics-invariant latent space derived from pretrained diffusion reconstruction residuals
- Source anonymization method using counterfactual causal fingerprints combined with fingerprint-constrained PGD adversarial perturbations to obfuscate image origin
🛡️ Threat Analysis
The paper's primary contribution is a novel forensic technique for content provenance — extracting inherent model traces (causal fingerprints) from AI-generated images to attribute them to their source generative model. This directly addresses output integrity and AI-generated content attribution. The anonymization component (defeating attribution via counterfactual fingerprints) is also an ML09 attack on content provenance systems.