attack 2025

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

0 citations · 61 references · arXiv

Published on arXiv

2512.22046

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

BadVSFM achieves strong, controllable backdoor effects on five VSFMs where direct transfer of classic backdoor attacks yields ASR below 5%, while all four evaluated defenses remain largely ineffective.

BadVSFM

Novel technique introduced

Prompt-driven Video Segmentation Foundation Models (VSFMs) such as SAM2 are increasingly deployed in applications like autonomous driving and digital pathology, raising concerns about backdoor threats. Surprisingly, we find that directly transferring classic backdoor attacks (e.g., BadNet) to VSFMs is almost ineffective, with ASR below 5\%. To understand this, we study encoder gradients and attention maps and observe that conventional training keeps gradients for clean and triggered samples largely aligned, while attention still focuses on the true object, preventing the encoder from learning a distinct trigger-related representation. To address this challenge, we propose BadVSFM, the first backdoor framework tailored to prompt-driven VSFMs. BadVSFM uses a two-stage strategy: (1) steer the image encoder so triggered frames map to a designated target embedding while clean frames remain aligned with a clean reference encoder; (2) train the mask decoder so that, across prompt types, triggered frame-prompt pairs produce a shared target mask, while clean outputs stay close to a reference decoder. Extensive experiments on two datasets and five VSFMs show that BadVSFM achieves strong, controllable backdoor effects under diverse triggers and prompts while preserving clean segmentation quality. Ablations over losses, stages, targets, trigger settings, and poisoning rates demonstrate robustness to reasonable hyperparameter changes and confirm the necessity of the two-stage design. Finally, gradient-conflict analysis and attention visualizations show that BadVSFM separates triggered and clean representations and shifts attention to trigger regions, while four representative defenses remain largely ineffective, revealing an underexplored vulnerability in current VSFMs.

Key Contributions

Demonstrates that classic backdoor attacks (e.g., BadNet) are almost entirely ineffective on prompt-driven VSFMs (ASR < 5%), with gradient analysis and attention maps explaining the failure.
Proposes BadVSFM, the first backdoor framework for VSFMs, using a two-stage strategy: (1) steer the image encoder to map triggered frames to a target embedding while preserving clean alignment, and (2) train the mask decoder to produce a target mask for triggered frame-prompt pairs across all prompt types.
Shows that four representative defenses remain largely ineffective against BadVSFM, revealing an underexplored and practical vulnerability in current VSFMs.

🛡️ Threat Analysis

Model Poisoning

BadVSFM embeds hidden, trigger-activated backdoor behavior into VSFMs (SAM2 and others) via a two-stage training strategy that steers the encoder toward a target embedding for triggered frames and trains the decoder to produce a target mask — a classic backdoor/trojan attack tailored to a novel model class.

Details

Domains

vision

Model Types

transformer

Threat Tags

training_timetargeteddigitalwhite_box

Datasets

DAVISLVOS

Applications

video object segmentationautonomous drivingdigital pathology

Read PDF arXiv DOI

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

DF-LoGiT: Data-Free Logic-Gated Backdoor Attacks in Vision Transformers

Hardware-Triggered Backdoors

The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models

Theory of Minimal Weight Perturbations in Deep Networks and its Applications for Low-Rank Activated Backdoor Attacks

Backdoor Directions in Vision Transformers

Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical Imaging