defense 2026

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders

Zhuxin Lei 1,2, Ziyuan Yang 1,2, Yi Zhang 1,2

0 citations · 53 references · arXiv (Cornell University)

α

Published on arXiv

2602.11204

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Achieves up to 29.20% improvement in benign performance and 73.86% gain in adversarial robustness over prior adversarial fine-tuning defenses across 11 SSL methods and 6 datasets.

ZePAD

Novel technique introduced


The widespread use of publicly available pre-trained encoders from self-supervised learning (SSL) has exposed a critical vulnerability: their susceptibility to downstream-agnostic adversarial examples (DAEs), which are crafted without knowledge of the downstream tasks but capable of misleading downstream models. While several defense methods have been explored recently, they rely primarily on task-specific adversarial fine-tuning, which inevitably limits generalizability and causes catastrophic forgetting and deteriorates benign performance. Different with previous works, we propose a more rigorous defense goal that requires only a single tuning for diverse downstream tasks to defend against DAEs and preserve benign performance. To achieve this defense goal, we introduce Zero-Sacrifice Persistent-Robustness Adversarial Defense (ZePAD), which is inspired by the inherent sensitivity of neural networks to data characteristics. Specifically, ZePAD is a dual-branch structure, which consists of a Multi-Pattern Adversarial Enhancement Branch (MPAE-Branch) that uses two adversarially fine-tuned encoders to strengthen adversarial resistance. The Benign Memory Preservation Branch (BMP-Branch) is trained on local data to ensure adversarial robustness does not compromise benign performance. Surprisingly, we find that ZePAD can directly detect DAEs by evaluating branch confidence, without introducing any adversarial exsample identification task during training. Notably, by enriching feature diversity, our method enables a single adversarial fine-tuning to defend against DAEs across downstream tasks, thereby achieving persistent robustness. Extensive experiments on 11 SSL methods and 6 datasets validate its effectiveness. In certain cases, it achieves a 29.20% improvement in benign performance and a 73.86% gain in adversarial robustness, highlighting its zero-sacrifice property.


Key Contributions

  • ZePAD dual-branch architecture (MPAE-Branch + BMP-Branch) that jointly improves adversarial robustness and preserves benign performance of pre-trained SSL encoders
  • Persistent robustness: a single adversarial fine-tuning step generalizes across diverse downstream tasks without catastrophic forgetting
  • Emergent DAE detection capability via branch-confidence comparison, without explicit adversarial detection training

🛡️ Threat Analysis

Input Manipulation Attack

Defends against downstream-agnostic adversarial examples (DAEs) — inference-time input manipulation attacks crafted to fool pre-trained SSL encoder representations and mislead all downstream models. The proposed ZePAD method is fundamentally an adversarial robustness defense evaluated in both semi-black-box and white-box settings.


Details

Domains
vision
Model Types
transformercnn
Threat Tags
white_boxgrey_boxinference_timeuntargeteddigital
Datasets
6 unnamed datasets (experiments cover 11 SSL methods)
Applications
self-supervised pre-trained encoder defenseimage classification