attack 2026

Phantasia: Context-Adaptive Backdoors in Vision Language Models

Nam Duong Tran , Phi Le Nguyen

0 citations

α

Published on arXiv

2604.08395

Model Poisoning

OWASP ML Top 10 — ML10

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves state-of-the-art attack success rates while evading output-based defenses that reduce prior attacks (TrojVLM, VLOOD) from 93% ASR to below 3%

Phantasia

Novel technique introduced


Recent advances in Vision-Language Models (VLMs) have greatly enhanced the integration of visual perception and linguistic reasoning, driving rapid progress in multimodal understanding. Despite these achievements, the security of VLMs, particularly their vulnerability to backdoor attacks, remains significantly underexplored. Existing backdoor attacks on VLMs are still in an early stage of development, with most current methods relying on generating poisoned responses that contain fixed, easily identifiable patterns. In this work, we make two key contributions. First, we demonstrate for the first time that the stealthiness of existing VLM backdoor attacks has been substantially overestimated. By adapting defense techniques originally designed for other domains (e.g., vision-only and text-only models), we show that several state-of-the-art attacks can be detected with surprising ease. Second, to address this gap, we introduce Phantasia, a context-adaptive backdoor attack that dynamically aligns its poisoned outputs with the semantics of each input. Instead of producing static poisoned patterns, Phantasia encourages models to generate contextually coherent yet malicious responses that remain plausible, thereby significantly improving stealth and adaptability. Extensive experiments across diverse VLM architectures reveal that Phantasia achieves state-of-the-art attack success rates while maintaining benign performance under various defensive settings.


Key Contributions

  • Demonstrates that existing VLM backdoor attacks are defeated by basic defenses (ASR drops below 3% with ONION-R)
  • Introduces Phantasia, a context-adaptive backdoor that generates semantically coherent malicious responses aligned with input context
  • Proposes teacher-student distillation framework with cross-attention alignment to guide context-adaptive poisoned output generation

🛡️ Threat Analysis

Input Manipulation Attack

The attack involves manipulating VLM outputs at inference time through triggered inputs, and the paper evaluates against defenses like ONION-R that detect adversarial outputs.

Model Poisoning

Core contribution is a backdoor/trojan attack on VLMs that embeds hidden malicious behavior triggered by specific inputs, generating contextually adaptive poisoned responses rather than fixed patterns.


Details

Domains
multimodalvisionnlp
Model Types
vlmmultimodaltransformer
Threat Tags
training_timeinference_timetargeted
Applications
vision-language modelsmultimodal understanding