BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors
Chia-Yi Hsu 1, Yu-Lin Tsai 1, Yu Zhe 2,3, Yan-Lun Chen 1, Chih-Hsun Lin 1, Chia-Mu Yu 1, Yang Zhang 4, Chun-Ying Huang 1, Jun Sakuma 2,3
Published on arXiv
2501.02373
Model Poisoning
OWASP ML Top 10 — ML10
Transfer Learning Attack
OWASP ML Top 10 — ML07
Key Finding
BadTV achieves near-perfect attack success rates across task learning, forgetting, and analogy scenarios while evading all evaluated defenses on CLIP and Llama models
BadTV
Novel technique introduced
Task arithmetic in large-scale pre-trained models enables agile adaptation to diverse downstream tasks without extensive retraining. By leveraging task vectors (TVs), users can perform modular updates through simple arithmetic operations like addition and subtraction. Yet, this flexibility presents new security challenges. In this paper, we investigate how TVs are vulnerable to backdoor attacks, revealing how malicious actors can exploit them to compromise model integrity. By creating composite backdoors that are designed asymmetrically, we introduce BadTV, a backdoor attack specifically crafted to remain effective simultaneously under task learning, forgetting, and analogy operations. Extensive experiments show that BadTV achieves near-perfect attack success rates across diverse scenarios, posing a serious threat to models relying on task arithmetic. We also evaluate current defenses, finding they fail to detect or mitigate BadTV. Our results highlight the urgent need for robust countermeasures to secure TVs in real-world deployments.
Key Contributions
- First security analysis of task arithmetic from an attacker's perspective, identifying task vectors as a novel backdoor attack surface
- BadTV: composite asymmetric backdoor attack designed to remain simultaneously effective under task learning, forgetting, and analogy arithmetic operations
- Empirical demonstration that existing defenses (one non-adaptive, three adaptive) all fail to detect or mitigate BadTV across CLIP and Llama models
🛡️ Threat Analysis
Task vectors are fine-tuning weight deltas analogous to LoRA adapters; BadTV specifically exploits the transfer learning pipeline (task arithmetic) and is crafted to survive add/subtract/analogy operations across composed models — fitting the 'adapter/LoRA trojan' and 'backdoors that survive fine-tuning' criteria in ML07.
BadTV is fundamentally a backdoor/trojan attack: it embeds hidden, targeted malicious behavior into task vectors that activates only with specific triggers while the model behaves normally otherwise — textbook ML10.