defense arXiv Aug 5, 2025 · Aug 2025
Xinwei Liu, Xiaojun Jia, Yuan Xun et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more
Defends geolocation privacy against VLMs by applying adversarial image perturbations with feature disentanglement and scale-adaptive optimization
Input Manipulation Attack Prompt Injection visionmultimodal
Vision-Language Models (VLMs) such as GPT-4o now demonstrate a remarkable ability to infer users' locations from public shared images, posing a substantial risk to geoprivacy. Although adversarial perturbations offer a potential defense, current methods are ill-suited for this scenario: they often perform poorly on high-resolution images and low perturbation budgets, and may introduce irrelevant semantic content. To address these limitations, we propose GeoShield, a novel adversarial framework designed for robust geoprivacy protection in real-world scenarios. GeoShield comprises three key modules: a feature disentanglement module that separates geographical and non-geographical information, an exposure element identification module that pinpoints geo-revealing regions within an image, and a scale-adaptive enhancement module that jointly optimizes perturbations at both global and local levels to ensure effectiveness across resolutions. Extensive experiments on challenging benchmarks show that GeoShield consistently surpasses prior methods in black-box settings, achieving strong privacy protection with minimal impact on visual or semantic quality. To our knowledge, this work is the first to explore adversarial perturbations for defending against geolocation inference by advanced VLMs, providing a practical and effective solution to escalating privacy concerns.
vlm Chinese Academy of Sciences · University of Chinese Academy of Sciences · Nanyang Technological University +2 more
attack arXiv Aug 6, 2025 · Aug 2025
Yuan Xun, Xiaojun Jia, Xinwei Liu et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more
EmoAgent jailbreaks multimodal reasoning models by using exaggerated emotional prompts to override safety protocols during deep-thinking stages
Prompt Injection multimodalnlp
We observe that MLRMs oriented toward human-centric service are highly susceptible to user emotional cues during the deep-thinking stage, often overriding safety protocols or built-in safety checks under high emotional intensity. Inspired by this key insight, we propose EmoAgent, an autonomous adversarial emotion-agent framework that orchestrates exaggerated affective prompts to hijack reasoning pathways. Even when visual risks are correctly identified, models can still produce harmful completions through emotional misalignment. We further identify persistent high-risk failure modes in transparent deep-thinking scenarios, such as MLRMs generating harmful reasoning masked behind seemingly safe responses. These failures expose misalignments between internal inference and surface-level behavior, eluding existing content-based safeguards. To quantify these risks, we introduce three metrics: (1) Risk-Reasoning Stealth Score (RRSS) for harmful reasoning beneath benign outputs; (2) Risk-Visual Neglect Rate (RVNR) for unsafe completions despite visual risk recognition; and (3) Refusal Attitude Inconsistency (RAIC) for evaluating refusal unstability under prompt variants. Extensive experiments on advanced MLRMs demonstrate the effectiveness of EmoAgent and reveal deeper emotional cognitive misalignments in model safety behavior.
vlm llm Chinese Academy of Sciences · University of Chinese Academy of Sciences · Nanyang Technological University