Causal Fingerprints of AI Generative Models

AI generative models leave implicit traces in their generated images, which are commonly referred to as model fingerprints and are exploited for source attribution. Prior methods rely on model-specific cues or synthesis artifacts, yielding limited fingerprints that may generalize poorly across different generative models. We argue that a complete model fingerprint should reflect the causality between image provenance and model traces, a direction largely unexplored. To this end, we conceptualize the \emph{causal fingerprint} of generative models, and propose a causality-decoupling framework that disentangles it from image-specific content and style in a semantic-invariant latent space derived from pre-trained diffusion reconstruction residual. We further enhance fingerprint granularity with diverse feature representations. We validate causality by assessing attribution performance across representative GANs and diffusion models and by achieving source anonymization using counterfactual examples generated from causal fingerprints. Experiments show our approach outperforms existing methods in model attribution, indicating strong potential for forgery detection, model copyright tracing, and identity protection.

Key Contributions

First formal definition of causal fingerprints for AI generative models, grounding source attribution in causality between image provenance and model traces
Causality-decoupling framework that disentangles causal fingerprints from image content and style in a semantics-invariant latent space derived from pretrained diffusion reconstruction residuals
Source anonymization method using counterfactual causal fingerprints combined with fingerprint-constrained PGD adversarial perturbations to obfuscate image origin

🛡️ Threat Analysis

Output Integrity Attack

The paper's primary contribution is a novel forensic technique for content provenance — extracting inherent model traces (causal fingerprints) from AI-generated images to attribute them to their source generative model. This directly addresses output integrity and AI-generated content attribution. The anonymization component (defeating attribution via counterfactual fingerprints) is also an ML09 attack on content provenance systems.

Details

Domains

visiongenerative

Model Types

gandiffusion

Threat Tags

inference_time

Datasets

AI-generated image benchmark (GANs and diffusion models)

Applications

2025 0 cit.

Output Integrity Attack

92%

Causal Fingerprints of AI Generative Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection

Moiré Video Authentication: A Physical Signature Against AI Video Generation

R$^2$BD: A Reconstruction-Based Method for Generalizable and Efficient Detection of Fake Images

Multi-Feature Fusion Approach for Generative AI Images Detection

CIPHER: Counterfeit Image Pattern High-level Examination via Representation

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers