Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor attacks remains largely underexplored. Prior segmentation backdoor studies transfer threat settings from existing image classification tasks, focusing primarily on object-to-background mis-segmentation. In this work, we revisit the threats by systematically examining backdoor attacks tailored to semantic segmentation. We identify four coarse-grained attack vectors (Object-to-Object, Object-to-Background, Background-to-Object, and Background-to-Background attacks), as well as two fine-grained vectors (Instance-Level and Conditional attacks). To formalize these attacks, we introduce BADSEG, a unified framework that optimizes trigger designs and applies label manipulation strategies to maximize attack performance while preserving victim model utility. Extensive experiments across diverse segmentation architectures on benchmark datasets demonstrate that BADSEG achieves high attack effectiveness with minimal impact on clean samples. We further evaluate six representative defenses and find that they fail to reliably mitigate our attacks, revealing critical gaps in current defenses. Finally, we demonstrate that these vulnerabilities persist in recent emerging architectures, including transformer-based networks and the Segment Anything Model (SAM), thereby compromising their security. Our work reveals previously overlooked security vulnerabilities in semantic segmentation, and motivates the development of defenses tailored to segmentation-specific threat models.

Key Contributions

Identifies six previously unexplored backdoor attack vectors for semantic segmentation (four coarse-grained: Object-to-Object, Object-to-Background, Background-to-Object, Background-to-Background; two fine-grained: Instance-Level, Conditional)
Introduces BADSEG, a unified framework with optimized trigger designs and label manipulation strategies tailored to semantic segmentation
Demonstrates attacks succeed against emerging architectures including Vision Transformers and Segment Anything Model (SAM), and that six representative defenses fail to mitigate the attacks

🛡️ Threat Analysis

Data Poisoning Attack

The attack is delivered via data poisoning during training — the paper explicitly states 'Backdoor attacks implant hidden triggers during training, typically via data poisoning'.

Model Poisoning

Proposes BADSEG, a unified backdoor attack framework that embeds hidden triggers during training via data poisoning to cause targeted mis-segmentation behaviors (Object-to-Object, Object-to-Background, etc.) when triggers appear at test time.