defense 2026

Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

Jiayi Xu 1, Zhang Zhang 1, Yuanrui Zhang 1, Ruitao Chen 1, Yixian Xu 1, Tianyu He 2, Di He 1

0 citations · 57 references · arXiv

α

Published on arXiv

2601.01085

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Luminark achieves probabilistically-certified watermark detection with exponentially low false positive rates and strong robustness against common image transformations across nine generative models, while substantially outperforming prior certified methods on image quality.

Luminark

Novel technique introduced


In this paper, we introduce \emph{Luminark}, a training-free and probabilistically-certified watermarking method for general vision generative models. Our approach is built upon a novel watermark definition that leverages patch-level luminance statistics. Specifically, the service provider predefines a binary pattern together with corresponding patch-level thresholds. To detect a watermark in a given image, we evaluate whether the luminance of each patch surpasses its threshold and then verify whether the resulting binary pattern aligns with the target one. A simple statistical analysis demonstrates that the false positive rate of the proposed method can be effectively controlled, thereby ensuring certified detection. To enable seamless watermark injection across different paradigms, we leverage the widely adopted guidance technique as a plug-and-play mechanism and develop the \emph{watermark guidance}. This design enables Luminark to achieve generality across state-of-the-art generative models without compromising image quality. Empirically, we evaluate our approach on nine models spanning diffusion, autoregressive, and hybrid frameworks. Across all evaluations, Luminark consistently demonstrates high detection accuracy, strong robustness against common image transformations, and good performance on visual quality.


Key Contributions

  • Novel patch-level luminance binary-pattern watermark definition with mathematically certified false-positive rate control.
  • Training-free 'watermark guidance' injection mechanism that generalizes as a plug-and-play module across diffusion, autoregressive, and hybrid generative frameworks.
  • Empirical evaluation across nine state-of-the-art vision generative models demonstrating high detection accuracy, transformation robustness, and preserved image quality.

🛡️ Threat Analysis

Output Integrity Attack

Embeds watermarks in model OUTPUT images (not model weights) to prove content provenance and enable detection of AI-generated imagery — classic output integrity / content watermarking.


Details

Domains
visiongenerative
Model Types
diffusiongan
Threat Tags
inference_time
Datasets
Stable Diffusion 2.1 (and 8 additional diffusion/AR/hybrid models)
Applications
ai-generated image provenancecontent watermarkingimage generation