MOLM: Mixture of LoRA Markers
Samar Fares 1, Nurbek Tastan 1, Noor Hussein 2, Karthik Nandakumar 1,2
Published on arXiv
2510.00293
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
MOLM maintains key recovery accuracy ≥ 0.96 under removal averaging attacks while degrading competitor WOUAF to 0.85–0.90, with ~1 day pre-training and no per-key retraining cost.
MOLM (Mixture of LoRA Markers)
Novel technique introduced
Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images and attribute these images to specific sources. While watermarking has emerged as a possible solution, existing methods remain fragile to realistic distortions, susceptible to adaptive removal, and expensive to update when the underlying watermarking key changes. We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. Within this framework, we introduce Mixture of LoRA Markers (MOLM), a routing-based instantiation in which binary keys activate lightweight LoRA adapters inside residual and attention blocks. This design avoids key-specific re-training and achieves the desired properties such as imperceptibility, fidelity, verifiability, and robustness. Experiments on Stable Diffusion and FLUX show that MOLM preserves image quality while achieving robust key recovery against distortions, compression and regeneration, averaging attacks, and black-box adversarial attacks on the extractor.
Key Contributions
- A general watermarking framework that encodes binary keys as key-dependent perturbations of generative model parameters, enabling multi-key support without per-key retraining
- MOLM: a routing-based instantiation using LoRA adapters in residual and attention blocks activated by binary key bits, achieving imperceptibility and fidelity alongside robustness
- Demonstrated robustness against averaging attacks, compression, regeneration, and both white-box and black-box adversarial attacks on the key extractor across Stable Diffusion and FLUX
🛡️ Threat Analysis
MOLM watermarks AI-generated image outputs to enable source attribution and content provenance — the watermark is implemented via model parameter perturbation (LoRA adapters activated by binary keys), but its purpose is tracking output provenance, not proving model ownership. The paper explicitly targets detection of synthetically generated images and resistance to watermark removal attacks (averaging attacks, adversarial attacks on the extractor, regeneration attacks).