Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm
Fuxiang Huang 1,2, Xiaowei Fu 1, Shiyu Ye 1, Lina Ma 1, Wen Li 3, Xinbo Gao 4, David Zhang 5, Lei Zhang 1
3 University of Electronic Science and Technology of China
Published on arXiv
2511.11009
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
DART improves adversarial robustness of UDA models under PGD/FGSM attacks while preserving clean accuracy on four domain adaptation benchmarks, whereas vanilla adversarial training yields robustness gains only at the cost of significant clean-sample accuracy degradation.
DART (Disentangled Adversarial Robustness Training)
Novel technique introduced
Unsupervised domain adaptation (UDA) aims to transfer knowledge from a label-rich source domain to an unlabeled target domain by addressing domain shifts. Most UDA approaches emphasize transfer ability, but often overlook robustness against adversarial attacks. Although vanilla adversarial training (VAT) improves the robustness of deep neural networks, it has little effect on UDA. This paper focuses on answering three key questions: 1) Why does VAT, known for its defensive effectiveness, fail in the UDA paradigm? 2) What is the generalization bound theory under attacks and how does it evolve from classical UDA theory? 3) How can we implement a robustification training procedure without complex modifications? Specifically, we explore and reveal the inherent entanglement challenge in general UDA+VAT paradigm, and propose an unsupervised robust domain adaptation (URDA) paradigm. We further derive the generalization bound theory of the URDA paradigm so that it can resist adversarial noise and domain shift. To the best of our knowledge, this is the first time to establish the URDA paradigm and theory. We further introduce a simple, novel yet effective URDA algorithm called Disentangled Adversarial Robustness Training (DART), a two-step training procedure that ensures both transferability and robustness. DART first pre-trains an arbitrary UDA model, and then applies an instantaneous robustification post-training step via disentangled distillation.Experiments on four benchmark datasets with/without attacks show that DART effectively enhances robustness while maintaining domain adaptability, and validate the URDA paradigm and theory.
Key Contributions
- Identifies and formalizes the 'entanglement challenge' that explains why vanilla adversarial training degrades clean-sample accuracy when applied naively to UDA models
- Derives a generalization bound theory for the URDA paradigm that accounts for both adversarial noise and domain shift simultaneously
- Proposes DART, a model-agnostic two-step procedure (UDA pre-training + disentangled distillation post-training) that achieves both adversarial robustness and domain transferability without complex joint modifications
🛡️ Threat Analysis
The paper's core contribution is defending against adversarial examples (PGD, FGSM) in a UDA setting. It diagnoses why vanilla adversarial training (AT) fails in two-domain settings — the 'entanglement challenge' — and proposes DART as a disentangled adversarial robustness defense. Both the attack threat model and the defense response are squarely within the adversarial example / input manipulation paradigm.