Authenticated Contradictions from Desynchronized Provenance and Watermarking
Alexander Nemecek 1, Hengzhi He 2, Guang Cheng 2, Erman Ayday 1
Published on arXiv
2603.02378
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Cross-layer audit protocol achieves 100% classification accuracy across all conflict-matrix states and perturbation conditions; metadata washing workflows reproduce the authenticated fake condition through standard editing pipelines without any cryptographic compromise.
Integrity Clash / cross-layer audit protocol
Novel technique introduced
Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.
Key Contributions
- Formalizes the 'Integrity Clash' — a four-state conflict matrix identifying the 'Authenticated Fake' quadrant where a valid C2PA human-authorship manifest and an AI pixel watermark coexist on the same asset
- Constructs metadata washing workflows using standard editing pipelines that produce authenticated fakes with no cryptographic compromise, exploiting a semantic omission permitted by the current C2PA specification
- Proposes and evaluates a cross-layer audit protocol jointly checking C2PA provenance and watermark signals, achieving 100% classification accuracy across 3,500 test images under three realistic perturbation conditions
🛡️ Threat Analysis
Squarely targets output integrity and content provenance: formalizes the 'Integrity Clash' attack where AI-generated images carry valid C2PA human-authorship manifests while simultaneously containing AI watermarks — an attack on content authentication pipelines. Also proposes the cross-layer audit protocol as a defense, which is evaluated on AI-generated content detection across 3,500 images.