attack arXiv Feb 15, 2026 · 7w ago
Xiaojun Jia, Jie Liao, Simeng Qin et al. · Nanyang Technological University · Chongqing University +4 more
Automated framework crafts stealthy skill-based prompt injections against LLM coding agents using closed-loop refinement agents
Prompt Injection Insecure Plugin Design nlp
Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. In practice, naive injections often fail because the malicious intent is too explicit or drifts too far from the original skill, leading agents to ignore or refuse them; existing attacks are also largely hand-crafted. We propose the first automated framework for stealthy prompt injection tailored to agent skills. The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills in a realistic tool environment, and an Evaluate Agent that logs action traces (e.g., tool calls and file operations) and verifies whether targeted malicious behaviors occurred. We also propose a malicious payload hiding strategy that conceals adversarial operations in auxiliary scripts while injecting optimized inducement prompts to trigger tool execution. Extensive experiments across diverse coding-agent settings and real-world software engineering tasks show that our method consistently achieves high attack success rates under realistic settings.
llm Nanyang Technological University · Chongqing University · BraneMatrix AI +3 more
defense arXiv Aug 5, 2025 · Aug 2025
Xinwei Liu, Xiaojun Jia, Yuan Xun et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more
Defends geolocation privacy against VLMs by applying adversarial image perturbations with feature disentanglement and scale-adaptive optimization
Input Manipulation Attack Prompt Injection visionmultimodal
Vision-Language Models (VLMs) such as GPT-4o now demonstrate a remarkable ability to infer users' locations from public shared images, posing a substantial risk to geoprivacy. Although adversarial perturbations offer a potential defense, current methods are ill-suited for this scenario: they often perform poorly on high-resolution images and low perturbation budgets, and may introduce irrelevant semantic content. To address these limitations, we propose GeoShield, a novel adversarial framework designed for robust geoprivacy protection in real-world scenarios. GeoShield comprises three key modules: a feature disentanglement module that separates geographical and non-geographical information, an exposure element identification module that pinpoints geo-revealing regions within an image, and a scale-adaptive enhancement module that jointly optimizes perturbations at both global and local levels to ensure effectiveness across resolutions. Extensive experiments on challenging benchmarks show that GeoShield consistently surpasses prior methods in black-box settings, achieving strong privacy protection with minimal impact on visual or semantic quality. To our knowledge, this work is the first to explore adversarial perturbations for defending against geolocation inference by advanced VLMs, providing a practical and effective solution to escalating privacy concerns.
vlm Chinese Academy of Sciences · University of Chinese Academy of Sciences · Nanyang Technological University +2 more
attack arXiv Aug 7, 2025 · Aug 2025
Qi Guo, Xiaojun Jia, Shanmin Pang et al. · Xi’an Jiaotong University · A*STAR +4 more
Physical adversarial patch attack on MLLM-based autonomous driving using SVD alignment and semantic mask optimization to steer perception and planning outputs
Input Manipulation Attack Prompt Injection visionmultimodal
Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks, particularly adversarial patch attacks, which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models and perform poorly when transferred to MLLM-based systems due to the latter's complex architectures and reasoning abilities. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms prior methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.
vlm multimodal Xi’an Jiaotong University · A*STAR · Nanyang Technological University +3 more