MOLM: Mixture of LoRA Markers | ML Security Papers

Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images and attribute these images to specific sources. While watermarking has emerged as a possible solution, existing methods remain fragile to realistic distortions, susceptible to adaptive removal, and expensive to update when the underlying watermarking key changes. We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. Within this framework, we introduce Mixture of LoRA Markers (MOLM), a routing-based instantiation in which binary keys activate lightweight LoRA adapters inside residual and attention blocks. This design avoids key-specific re-training and achieves the desired properties such as imperceptibility, fidelity, verifiability, and robustness. Experiments on Stable Diffusion and FLUX show that MOLM preserves image quality while achieving robust key recovery against distortions, compression and regeneration, averaging attacks, and black-box adversarial attacks on the extractor.

Key Contributions

A general watermarking framework that encodes binary keys as key-dependent perturbations of generative model parameters, enabling multi-key support without per-key retraining
MOLM: a routing-based instantiation using LoRA adapters in residual and attention blocks activated by binary key bits, achieving imperceptibility and fidelity alongside robustness
Demonstrated robustness against averaging attacks, compression, regeneration, and both white-box and black-box adversarial attacks on the key extractor across Stable Diffusion and FLUX

🛡️ Threat Analysis

Output Integrity Attack

MOLM watermarks AI-generated image outputs to enable source attribution and content provenance — the watermark is implemented via model parameter perturbation (LoRA adapters activated by binary keys), but its purpose is tracking output provenance, not proving model ownership. The paper explicitly targets detection of synthetically generated images and resistance to watermark removal attacks (averaging attacks, adversarial attacks on the extractor, regeneration attacks).

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxblack_boxinference_time

Datasets

Stable DiffusionFLUX

Applications

2026 0 cit.

Output Integrity Attack

85%