attack 2026

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

0 citations

Published on arXiv

2603.27522

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Achieves high advertisement injection efficacy with near-zero false positives while maintaining task accuracy across three VLM architectures; defenses including instruction-based filtering and clean fine-tuning fail to remove the backdoor without significant utility degradation

Hidden Ads

Novel technique introduced

Vision-Language Models (VLMs) are increasingly deployed in consumer applications where users seek recommendations about products, dining, and services. We introduce Hidden Ads, a new class of backdoor attacks that exploit this recommendation-seeking behavior to inject unauthorized advertisements. Unlike traditional pattern-triggered backdoors that rely on artificial triggers such as pixel patches or special tokens, Hidden Ads activates on natural user behaviors: when users upload images containing semantic content of interest (e.g., food, cars, animals) and ask recommendation-seeking questions, the backdoored model provides correct, helpful answers while seamlessly appending attacker-specified promotional slogans. This design preserves model utility and produces natural-sounding injections, making the attack practical for real-world deployment in consumer-facing recommendation services. We propose a multi-tier threat framework to systematically evaluate Hidden Ads across three adversary capability levels: hard prompt injection, soft prompt optimization, and supervised fine-tuning. Our poisoned data generation pipeline uses teacher VLM-generated chain-of-thought reasoning to create natural trigger--slogan associations across multiple semantic domains. Experiments on three VLM architectures demonstrate that Hidden Ads achieves high injection efficacy with near-zero false positives while maintaining task accuracy. Ablation studies confirm that the attack is data-efficient, transfers effectively to unseen datasets, and scales to multiple concurrent domain-slogan pairs. We evaluate defenses including instruction-based filtering and clean fine-tuning, finding that both fail to remove the backdoor without causing significant utility degradation.

Key Contributions

Novel behavior-triggered semantic backdoor that activates on natural user interactions (semantic content + recommendation-seeking questions) rather than artificial pattern triggers
Multi-tier threat framework spanning hard prompt injection, soft prompt optimization, and supervised fine-tuning to evaluate attacks across different adversary capability levels
Poisoned data generation pipeline using teacher VLM-generated chain-of-thought reasoning to create natural trigger-slogan associations that achieve high injection efficacy with near-zero false positives

🛡️ Threat Analysis

Model Poisoning

Core contribution is a backdoor attack that embeds hidden malicious behavior (advertisement injection) triggered by semantic content and user behavior patterns. The attack uses poisoned training data with chain-of-thought reasoning to create trigger-payload associations, and activates only when compound conditions are met (semantic content + recommendation-seeking questions). This is a targeted backdoor with specific trigger conditions.

Details

Domains

multimodalvisionnlp

Model Types

vlmmultimodaltransformer

Threat Tags

training_timetargeteddigital

Applications

visual question answeringmultimodal chatrecommendation systemsconsumer shopping and dining assistants

Read PDF arXiv

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Concept-Guided Backdoor Attack on Vision Language Models

TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models

AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding