attack 2026

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu , Jiaqi Xu , Jianwei Ding , Qiyao Deng

0 citations

α

Published on arXiv

2602.21977

Model Poisoning

OWASP ML Top 10 — ML10

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Key Finding

MasqLoRA achieves a 99.8% backdoor attack success rate using only a small number of trigger-image pairs and minimal compute, while maintaining normal generation quality in the absence of the trigger.

MasqLoRA

Novel technique introduced


Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.


Key Contributions

  • First systematic backdoor attack framework leveraging LoRA adapters as the attack vehicle for text-to-image diffusion models, requiring only base model freezing and adapter weight updates with a small set of trigger-image pairs.
  • Demonstrates that a standalone malicious LoRA module can embed a hidden cross-modal trigger mapping (textual trigger → predefined image output) while remaining visually indistinguishable from benign adapters.
  • Achieves 99.8% attack success rate with minimal training overhead, exposing a critical and previously unaddressed threat in the LoRA model-sharing ecosystem.

🛡️ Threat Analysis

AI Supply Chain Attacks

The attack is explicitly designed to exploit the LoRA-sharing supply chain — the attacker distributes a trojanized-but-benign-appearing LoRA adapter via open-source platforms (e.g., HuggingFace). The 'masquerade as benign' threat model is fundamentally a supply chain deception, matching the 'supply chain distribution of backdoored models' dual-tag case.

Model Poisoning

MasqLoRA is a textbook backdoor attack: a specific textual trigger causes the model to produce a predefined malicious visual output, while behaving indistinguishably from a benign model otherwise. The attack injects hidden, trigger-activated behavior via LoRA adapter weights.


Details

Domains
visiongenerativemultimodal
Model Types
diffusionmultimodal
Threat Tags
training_timetargeteddigitalblack_box
Applications
text-to-image generationstable diffusionlora model sharing platforms