ZK-WAGON: Imperceptible Watermark for Image Generation Models using ZK-SNARKs

As image generation models grow increasingly powerful and accessible, concerns around authenticity, ownership, and misuse of synthetic media have become critical. The ability to generate lifelike images indistinguishable from real ones introduces risks such as misinformation, deepfakes, and intellectual property violations. Traditional watermarking methods either degrade image quality, are easily removed, or require access to confidential model internals - making them unsuitable for secure and scalable deployment. We are the first to introduce ZK-WAGON, a novel system for watermarking image generation models using the Zero-Knowledge Succinct Non Interactive Argument of Knowledge (ZK-SNARKs). Our approach enables verifiable proof of origin without exposing model weights, generation prompts, or any sensitive internal information. We propose Selective Layer ZK-Circuit Creation (SL-ZKCC), a method to selectively convert key layers of an image generation model into a circuit, reducing proof generation time significantly. Generated ZK-SNARK proofs are imperceptibly embedded into a generated image via Least Significant Bit (LSB) steganography. We demonstrate this system on both GAN and Diffusion models, providing a secure, model-agnostic pipeline for trustworthy AI image generation.

Key Contributions

First system combining ZK-SNARKs with image watermarking, enabling verifiable proof of model origin without exposing model weights, prompts, or internal parameters
Selective Layer ZK-Circuit Creation (SL-ZKCC), a method to convert only key model layers into ZK circuits, significantly reducing proof generation overhead
End-to-end model-agnostic pipeline that imperceptibly embeds ZK-SNARK proofs into generated images via LSB steganography, demonstrated on both GANs and diffusion models

🛡️ Threat Analysis

Output Integrity Attack

The watermark (ZK-SNARK proof) is embedded into the IMAGE OUTPUT via steganography — not into model weights — with the explicit goal of content provenance verification for AI-generated images. This is squarely content watermarking to trace and authenticate model-generated outputs, directly addressing synthetic media authenticity and deepfake attribution concerns.