defense 2026

Authenticated Contradictions from Desynchronized Provenance and Watermarking

Alexander Nemecek 1, Hengzhi He 2, Guang Cheng 2, Erman Ayday 1

0 citations

α

Published on arXiv

2603.02378

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Cross-layer audit protocol achieves 100% classification accuracy across all conflict-matrix states and perturbation conditions; metadata washing workflows reproduce the authenticated fake condition through standard editing pipelines without any cryptographic compromise.

Integrity Clash / cross-layer audit protocol

Novel technique introduced


Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.


Key Contributions

  • Formalizes the 'Integrity Clash' — a four-state conflict matrix identifying the 'Authenticated Fake' quadrant where a valid C2PA human-authorship manifest and an AI pixel watermark coexist on the same asset
  • Constructs metadata washing workflows using standard editing pipelines that produce authenticated fakes with no cryptographic compromise, exploiting a semantic omission permitted by the current C2PA specification
  • Proposes and evaluates a cross-layer audit protocol jointly checking C2PA provenance and watermark signals, achieving 100% classification accuracy across 3,500 test images under three realistic perturbation conditions

🛡️ Threat Analysis

Output Integrity Attack

Squarely targets output integrity and content provenance: formalizes the 'Integrity Clash' attack where AI-generated images carry valid C2PA human-authorship manifests while simultaneously containing AI watermarks — an attack on content authentication pipelines. Also proposes the cross-layer audit protocol as a defense, which is evaluated on AI-generated content detection across 3,500 images.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_timedigital
Datasets
3,500 test images (four conflict-matrix states, three perturbation conditions)
Applications
content authenticationai-generated image detectioncontent provenance