Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation
Guangsheng Zhang 1, Huan Tian 1, Leo Zhang 2, Tianqing Zhu 3, Ming Ding 4, Wanlei Zhou 3, Bo Liu 1
Published on arXiv
2603.16405
Model Poisoning
OWASP ML Top 10 — ML10
Data Poisoning Attack
OWASP ML Top 10 — ML02
Key Finding
BADSEG achieves high attack effectiveness with minimal impact on clean samples across diverse segmentation architectures, and six representative defenses fail to reliably mitigate the attacks
BADSEG
Novel technique introduced
Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor attacks remains largely underexplored. Prior segmentation backdoor studies transfer threat settings from existing image classification tasks, focusing primarily on object-to-background mis-segmentation. In this work, we revisit the threats by systematically examining backdoor attacks tailored to semantic segmentation. We identify four coarse-grained attack vectors (Object-to-Object, Object-to-Background, Background-to-Object, and Background-to-Background attacks), as well as two fine-grained vectors (Instance-Level and Conditional attacks). To formalize these attacks, we introduce BADSEG, a unified framework that optimizes trigger designs and applies label manipulation strategies to maximize attack performance while preserving victim model utility. Extensive experiments across diverse segmentation architectures on benchmark datasets demonstrate that BADSEG achieves high attack effectiveness with minimal impact on clean samples. We further evaluate six representative defenses and find that they fail to reliably mitigate our attacks, revealing critical gaps in current defenses. Finally, we demonstrate that these vulnerabilities persist in recent emerging architectures, including transformer-based networks and the Segment Anything Model (SAM), thereby compromising their security. Our work reveals previously overlooked security vulnerabilities in semantic segmentation, and motivates the development of defenses tailored to segmentation-specific threat models.
Key Contributions
- Identifies six previously unexplored backdoor attack vectors for semantic segmentation (four coarse-grained: Object-to-Object, Object-to-Background, Background-to-Object, Background-to-Background; two fine-grained: Instance-Level, Conditional)
- Introduces BADSEG, a unified framework with optimized trigger designs and label manipulation strategies tailored to semantic segmentation
- Demonstrates attacks succeed against emerging architectures including Vision Transformers and Segment Anything Model (SAM), and that six representative defenses fail to mitigate the attacks
🛡️ Threat Analysis
The attack is delivered via data poisoning during training — the paper explicitly states 'Backdoor attacks implant hidden triggers during training, typically via data poisoning'.
Proposes BADSEG, a unified backdoor attack framework that embeds hidden triggers during training via data poisoning to cause targeted mis-segmentation behaviors (Object-to-Object, Object-to-Background, etc.) when triggers appear at test time.