defense 2025

MOLM: Mixture of LoRA Markers

Samar Fares 1, Nurbek Tastan 1, Noor Hussein 2, Karthik Nandakumar 1,2

0 citations · 41 references · arXiv

α

Published on arXiv

2510.00293

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

MOLM maintains key recovery accuracy ≥ 0.96 under removal averaging attacks while degrading competitor WOUAF to 0.85–0.90, with ~1 day pre-training and no per-key retraining cost.

MOLM (Mixture of LoRA Markers)

Novel technique introduced


Generative models can generate photorealistic images at scale. This raises urgent concerns about the ability to detect synthetically generated images and attribute these images to specific sources. While watermarking has emerged as a possible solution, existing methods remain fragile to realistic distortions, susceptible to adaptive removal, and expensive to update when the underlying watermarking key changes. We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. Within this framework, we introduce Mixture of LoRA Markers (MOLM), a routing-based instantiation in which binary keys activate lightweight LoRA adapters inside residual and attention blocks. This design avoids key-specific re-training and achieves the desired properties such as imperceptibility, fidelity, verifiability, and robustness. Experiments on Stable Diffusion and FLUX show that MOLM preserves image quality while achieving robust key recovery against distortions, compression and regeneration, averaging attacks, and black-box adversarial attacks on the extractor.


Key Contributions

  • A general watermarking framework that encodes binary keys as key-dependent perturbations of generative model parameters, enabling multi-key support without per-key retraining
  • MOLM: a routing-based instantiation using LoRA adapters in residual and attention blocks activated by binary key bits, achieving imperceptibility and fidelity alongside robustness
  • Demonstrated robustness against averaging attacks, compression, regeneration, and both white-box and black-box adversarial attacks on the key extractor across Stable Diffusion and FLUX

🛡️ Threat Analysis

Output Integrity Attack

MOLM watermarks AI-generated image outputs to enable source attribution and content provenance — the watermark is implemented via model parameter perturbation (LoRA adapters activated by binary keys), but its purpose is tracking output provenance, not proving model ownership. The paper explicitly targets detection of synthetically generated images and resistance to watermark removal attacks (averaging attacks, adversarial attacks on the extractor, regeneration attacks).


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
white_boxblack_boxinference_time
Datasets
Stable DiffusionFLUX
Applications
ai-generated image attributioncontent provenancesynthetic image detection