Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models
Ziyuan Chen 1, Yujin Jeong 1,2, Tobias Braun 1,2, Anna Rohrbach 1,2
Published on arXiv
2603.04064
Model Poisoning
OWASP ML Top 10 — ML10
Transfer Learning Attack
OWASP ML Top 10 — ML07
Key Finding
Tuning fewer than 0.2% of total text encoder parameters via low-rank adapters is sufficient to mount effective backdoor attacks on Stable Diffusion 3.
MELT (Multi-Encoder Lightweight aTtacks)
Novel technique introduced
As text-to-image diffusion models become increasingly deployed in real-world applications, concerns about backdoor attacks have gained significant attention. Prior work on text-based backdoor attacks has largely focused on diffusion models conditioned on a single lightweight text encoder. However, more recent diffusion models that incorporate multiple large-scale text encoders remain underexplored in this context. Given the substantially increased number of trainable parameters introduced by multiple text encoders, an important question is whether backdoor attacks can remain both efficient and effective in such settings. In this work, we study Stable Diffusion 3, which uses three distinct text encoders and has not yet been systematically analyzed for text-encoder-based backdoor vulnerabilities. To understand the role of text encoders in backdoor attacks, we define four categories of attack targets and identify the minimal sets of encoders required to achieve effective performance for each attack objective. Based on this, we further propose Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters while keeping the pretrained text encoder weight frozen. We demonstrate that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks on Stable Diffusion 3, revealing previously underexplored vulnerabilities in practical attack scenarios in multi-encoder settings.
Key Contributions
- Identifies four categories of backdoor attack targets for multi-encoder diffusion models and determines the minimal encoder subsets required for each objective in Stable Diffusion 3.
- Proposes MELT, which injects backdoors via LoRA adapters on text encoders while keeping pretrained weights frozen, achieving effective attacks by tuning fewer than 0.2% of total encoder parameters.
- Reveals previously underexplored text-encoder-based backdoor vulnerabilities in multi-encoder diffusion architectures such as Stable Diffusion 3.
🛡️ Threat Analysis
The attack mechanism specifically leverages low-rank adapters (LoRA) to inject the backdoor while keeping pretrained weights frozen, making this an adapter/LoRA trojan as explicitly listed under ML07 — the efficiency and novelty of the contribution is precisely the adapter-tuning exploitation angle.
MELT injects hidden backdoor behavior into Stable Diffusion 3 that activates on specific text triggers while the model behaves normally otherwise — a classic backdoor/trojan attack.