attack 2026

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi 1, Huajie Chen 1,2, Heng Xu 1, Zhiquan Liu 2, Jialiang Shen 3, Chi Liu 1, Shuai Zhou 1, Tianqing Zhu 1, Wanlei Zhou 1

0 citations

α

Published on arXiv

2603.04859

Data Poisoning Attack

OWASP ML Top 10 — ML02

Transfer Learning Attack

OWASP ML Top 10 — ML07

Key Finding

OD attack achieves high attack success rates on hidden hijacking tasks while preserving model utility on original tasks, using fewer poisoned samples than prior model hijacking methods and generalizing across diverse model architectures.

Osmosis Distillation (OD)

Novel technique introduced


Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.


Key Contributions

  • First to reveal the security threat of using third-party synthetic datasets (from dataset distillation) in transfer learning, enabling model hijacking with minimal poisoned samples.
  • Transporter module (U-Net encoder-decoder) that generates osmosis samples optimized for both visual similarity to benign data and semantic similarity to hijacking data, ensuring stealthiness.
  • Distilled osmosis dataset (DOD) creation pipeline using key patch selection, label reconstruction, and training trajectory matching, enabling cross-architecture model hijacking transfer.

🛡️ Threat Analysis

Data Poisoning Attack

Core mechanism is data poisoning — adversary injects malicious osmosis samples into a synthetic (distilled) dataset, causing victim models trained on it to perform hidden hijacking tasks. This is data injection/clean-label poisoning of the training corpus.

Transfer Learning Attack

Attack specifically and explicitly targets the transfer learning pipeline: the threat model assumes users download third-party synthetic/distilled datasets and fine-tune pre-trained models on them, and the OD attack is engineered to exploit this exact fine-tuning workflow to deliver its payload.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
training_timetargeteddigitalblack_box
Datasets
CIFAR-10CIFAR-100STL-10
Applications
image classificationtransfer learningdataset distillation