defense 2025

Adapter Shield: A Unified Framework with Built-in Authentication for Preventing Unauthorized Zero-Shot Image-to-Image Generation

Jun Jia , Hongyi Miao 1, Yingjie Zhou 2, Wangqiu Zhou 3, Jianbo Zhang 2, Linhan Cao 2, Dandan Zhu 4, Hua Yang 2, Xiongkuo Min 2, Wei Sun 4, Guangtao Zhai 2

0 citations · arXiv

α

Published on arXiv

2512.00075

Input Manipulation Attack

OWASP ML Top 10 — ML01

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Adapter Shield surpasses existing state-of-the-art defenses in blocking unauthorized zero-shot image synthesis across identity cloning and style imitation while maintaining full generation quality for authenticated users.

Adapter Shield

Novel technique introduced


With the rapid progress in diffusion models, image synthesis has advanced to the stage of zero-shot image-to-image generation, where high-fidelity replication of facial identities or artistic styles can be achieved using just one portrait or artwork, without modifying any model weights. Although these techniques significantly enhance creative possibilities, they also pose substantial risks related to intellectual property violations, including unauthorized identity cloning and stylistic imitation. To counter such threats, this work presents Adapter Shield, the first universal and authentication-integrated solution aimed at defending personal images from misuse in zero-shot generation scenarios. We first investigate how current zero-shot methods employ image encoders to extract embeddings from input images, which are subsequently fed into the UNet of diffusion models through cross-attention layers. Inspired by this mechanism, we construct a reversible encryption system that maps original embeddings into distinct encrypted representations according to different secret keys. The authorized users can restore the authentic embeddings via a decryption module and the correct key, enabling normal usage for authorized generation tasks. For protection purposes, we design a multi-target adversarial perturbation method that actively shifts the original embeddings toward designated encrypted patterns. Consequently, protected images are embedded with a defensive layer that ensures unauthorized users can only produce distorted or encrypted outputs. Extensive evaluations demonstrate that our method surpasses existing state-of-the-art defenses in blocking unauthorized zero-shot image synthesis, while supporting flexible and secure access control for verified users.


Key Contributions

  • First universal authentication-integrated defense against zero-shot image-to-image generation misuse, targeting the embedding space of image encoders feeding into diffusion UNets via cross-attention.
  • Reversible encryption system mapping embeddings to distinct encrypted representations keyed per user, enabling authorized decryption while producing distorted outputs for unauthorized access.
  • Multi-target adversarial perturbation method that actively shifts image embeddings toward designated encrypted patterns, surpassing existing state-of-the-art defenses on identity cloning and style imitation tasks.

🛡️ Threat Analysis

Input Manipulation Attack

Core mechanism is a multi-target adversarial perturbation method that shifts image encoder embeddings at inference time, causing diffusion models to produce distorted or encrypted outputs for unauthorized users — this is an adversarial input manipulation used as a defensive tool against generation pipelines.

Output Integrity Attack

The overarching security goal is protecting content (facial identities, artistic styles) from unauthorized AI-generated replication — a content integrity and provenance concern. The paper also builds reversible encryption/authentication into the protection scheme to control who can generate from protected images.


Details

Domains
visiongenerative
Model Types
diffusiontransformer
Threat Tags
white_boxinference_timedigital
Datasets
FFHQWikiArt
Applications
zero-shot image-to-image generationfacial identity protectionartistic style protection