defense 2026

A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

0 citations

Published on arXiv

2604.04488

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Effectively reduces backdoor attack success rate while maintaining high normal text generation capability across three models, two tasks, and six attacks under realistic low-frequency poisoning scenarios

Patch-based Cross-view Regularized Framework

Novel technique introduced

Multimodal large language models have become an important infrastructure for unified processing of visual and linguistic tasks. However, such models are highly susceptible to backdoor implantation during supervised fine-tuning and will steadily output the attacker's predefined harmful responses once a specific trigger pattern is activated. The core challenge of backdoor defense lies in suppressing attack success under low poisoning ratios while preserving the model's normal generation ability. These two objectives are inherently conflicting. Strong suppression often degrades benign performance, whereas weak regularization fails to mitigate backdoor behaviors. To this end, we propose a unified defense framework based on patch augmentation and cross-view regularity, which simultaneously constrains the model's anomalous behaviors in response to triggered patterns from both the feature representation and output distribution levels. Specifically, patch-level data augmentation is combined with cross-view output difference regularization to exploit the fact that backdoor responses are abnormally invariant to non-semantic perturbations and to proactively pull apart the output distributions of the original and perturbed views, thereby significantly suppressing the success rate of backdoor triggering. At the same time, we avoid over-suppression of the model during defense by imposing output entropy constraints, ensuring the quality of normal command generation. Experimental results across three models, two tasks, and six attacks show that our proposed defense method effectively reduces the attack success rate while maintaining a high level of normal text generation capability. Our work enables the secure, controlled deployment of large-scale multimodal models in realistic low-frequency poisoning and covert triggering scenarios.

Key Contributions

Patch-level data augmentation combined with cross-view output difference regularization to exploit backdoor responses' abnormal invariance to non-semantic perturbations
Output entropy constraints to prevent over-suppression and maintain normal generation quality
Unified defense framework effective across 3 models, 2 tasks, and 6 attack types under low poisoning ratios

🛡️ Threat Analysis

Model Poisoning

Paper directly addresses backdoor/trojan attacks in MLLMs where attackers implant hidden malicious behaviors triggered by specific visual patterns during fine-tuning. The defense detects and mitigates these backdoors.

Details

Domains

multimodalvisionnlp

Model Types

vlmmultimodalllmtransformer

Threat Tags

training_timetargeted

Applications

visual question answeringvision-language understandingmultimodal instruction following

Read PDF arXiv

A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Test-Time Attention Purification for Backdoored Large Vision Language Models

TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

ROKA: Robust Knowledge Unlearning against Adversaries