defense 2026

Authenticated Contradictions from Desynchronized Provenance and Watermarking

Alexander Nemecek ¹, Hengzhi He ², Guang Cheng ², Erman Ayday ¹

¹ Case Western Reserve University

² University of California, Los Angeles

0 citations

Published on arXiv

2603.02378

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Cross-layer audit protocol achieves 100% classification accuracy across all conflict-matrix states and perturbation conditions; metadata washing workflows reproduce the authenticated fake condition through standard editing pipelines without any cryptographic compromise.

Integrity Clash / cross-layer audit protocol

Novel technique introduced

Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.

Key Contributions

Formalizes the 'Integrity Clash' — a four-state conflict matrix identifying the 'Authenticated Fake' quadrant where a valid C2PA human-authorship manifest and an AI pixel watermark coexist on the same asset
Constructs metadata washing workflows using standard editing pipelines that produce authenticated fakes with no cryptographic compromise, exploiting a semantic omission permitted by the current C2PA specification
Proposes and evaluates a cross-layer audit protocol jointly checking C2PA provenance and watermark signals, achieving 100% classification accuracy across 3,500 test images under three realistic perturbation conditions

🛡️ Threat Analysis

Output Integrity Attack

Squarely targets output integrity and content provenance: formalizes the 'Integrity Clash' attack where AI-generated images carry valid C2PA human-authorship manifests while simultaneously containing AI watermarks — an attack on content authentication pipelines. Also proposes the cross-layer audit protocol as a defense, which is evaluated on AI-generated content detection across 3,500 images.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_timedigital

Datasets

3,500 test images (four conflict-matrix states, three perturbation conditions)

Applications

content authenticationai-generated image detectioncontent provenance

Read PDF arXiv

Authenticated Contradictions from Desynchronized Provenance and Watermarking

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Guidance Watermarking for Diffusion Models

WaTeRFlow: Watermark Temporal Robustness via Flow Consistency

OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection