attack arXiv Mar 31, 2026 · 8d ago
Meiwen Ding, Song Xia, Chenqi Kong et al. · Nanyang Technological University
Embeds imperceptible adversarial prompts into images via visual perturbations to jailbreak closed-source multimodal LLMs
Input Manipulation Attack Prompt Injection multimodalvisionnlp
Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves them vulnerable to prompt injection attacks. Existing prompt injection methods predominantly rely on textual prompts or perceptible visual prompts that are observable by human users. In this work, we study imperceptible visual prompt injection against powerful closed-source MLLMs, where adversarial instructions are embedded in the visual modality. Our method adaptively embeds the malicious prompt into the input image via a bounded text overlay to provide semantic guidance. Meanwhile, the imperceptible visual perturbation is iteratively optimized to align the feature representation of the attacked image with those of the malicious visual and textual targets at both coarse- and fine-grained levels. Specifically, the visual target is instantiated as a text-rendered image and progressively refined during optimization to more faithfully represent the desired semantics and improve transferability. Extensive experiments on two multimodal understanding tasks across multiple closed-source MLLMs demonstrate the superior performance of our approach compared to existing methods.
vlm multimodal transformer Nanyang Technological University
benchmark arXiv Apr 4, 2026 · 4d ago
Peijun Bao, Anwei Luo, Gang Pan et al. · Zhejiang University · Nanyang Technological University +4 more
Benchmark dataset and diffusion-based detector for localizing AI-manipulated activity segments seamlessly inserted into authentic videos
Output Integrity Attack visionmultimodal
Temporal forgery localization aims to temporally identify manipulated segments in videos. Most existing benchmarks focus on appearance-level forgeries, such as face swapping and object removal. However, recent advances in video generation have driven the emergence of activity-level forgeries that modify human actions to distort event semantics, resulting in highly deceptive forgeries that critically undermine media authenticity and public trust. To overcome this issue, we introduce ActivityForensics, the first large-scale benchmark for localizing manipulated activity in videos. It contains over 6K forged video segments that are seamlessly blended into the video context, rendering high visual consistency that makes them almost indistinguishable from authentic content to the human eye. We further propose Temporal Artifact Diffuser (TADiff), a simple yet effective baseline that exposes artifact cues through a diffusion-based feature regularizer. Based on ActivityForensics, we introduce comprehensive evaluation protocols covering intra-domain, cross-domain, and open-world settings, and benchmark a wide range of state-of-the-art forgery localizers to facilitate future research. The dataset and code are available at https://activityforensics.github.io.
diffusion transformer Zhejiang University · Nanyang Technological University · Jiangxi University of Finance and Economics +3 more