attack arXiv Feb 19, 2026 · 6w ago
Siyuan Liang, Yongcheng Jing, Yingjie Wang et al. · Nanyang Technological University · National University of Singapore
Stealthy, persistent backdoor attack on CLIP models achieving 99.99% ASR at 0.3% poisoning, robust against 19 defenses
Model Poisoning multimodalvision
Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence. Existing methods often fail under strong detection or continuous fine-tuning, largely due to (1) cross-modal inconsistency that exposes trigger patterns and (2) gradient dilution at low poisoning rates that accelerates backdoor forgetting. These coupled causes remain insufficiently modeled and addressed. We propose BadCLIP++, a unified framework that tackles both challenges. For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions, preserving clean-data statistics while producing compact trigger distributions. We further apply target-aligned subset selection to strengthen signals at low injection rates. For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment, and stabilize model parameters through curvature control and elastic weight consolidation, maintaining solutions within a low-curvature wide basin resistant to fine-tuning. We also provide the first theoretical analysis showing that, within a trust region, gradients from clean fine-tuning and backdoor objectives are co-directional, yielding a non-increasing upper bound on attack success degradation. Experiments demonstrate that with only 0.3% poisoning, BadCLIP++ achieves 99.99% attack success rate (ASR) in digital settings, surpassing baselines by 11.4 points. Across nineteen defenses, ASR remains above 99.90% with less than 0.8% drop in clean accuracy. The method further attains 65.03% success in physical attacks and shows robustness against watermark removal defenses.
vlm multimodal Nanyang Technological University · National University of Singapore
defense arXiv Feb 6, 2026 · 8w ago
Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · National University of Singapore +1 more
Proactive fine-tuning defense traps gradient-based jailbreak suffixes or fingerprints them, cutting LLM attack success below 0.01%
Input Manipulation Attack Prompt Injection nlp
Suffix-based jailbreak attacks append an adversarial suffix, i.e., a short token sequence, to steer aligned LLMs into unsafe outputs. Since suffixes are free-form text, they admit endlessly many surface forms, making jailbreak mitigation difficult. Most existing defenses depend on passive detection of suspicious suffixes, without leveraging the defender's inherent asymmetric ability to inject secrets and proactively conceal gaps. Motivated by this, we take a controllability-oriented perspective and develop a proactive defense that nudges attackers into a no-win dilemma: either they fall into defender-designed optimization traps and fail to produce an effective adversarial suffix, or they can succeed only by generating adversarial suffixes that carry distinctive, traceable fingerprints. We propose TrapSuffix, a lightweight fine-tuning approach that injects trap-aligned behaviors into the base model without changing the inference pipeline. TrapSuffix channels jailbreak attempts into these two outcomes by reshaping the model's response landscape to adversarial suffixes. Across diverse suffix-based jailbreak settings, TrapSuffix reduces the average attack success rate to below 0.01 percent and achieves an average tracing success rate of 87.9 percent, providing both strong defense and reliable traceability. It introduces no inference-time overhead and incurs negligible memory cost, requiring only 15.87 MB of additional memory on average, whereas state-of-the-art LLM-based detection defenses typically incur memory overheads at the 1e4 MB level, while composing naturally with existing filtering-based defenses for complementary protection.
llm transformer National University of Defense Technology · National University of Singapore · Zhejiang University