defense 2025

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

0 citations

Published on arXiv

2508.09230

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Cowpox provably limits the expected number of agent infections in a VLM-based multi-agent system and empirically reduces jailbreak propagation, providing certified robustness against adaptive attacks.

Cowpox

Novel technique introduced

Vision Language Model (VLM)-based agents are stateful, autonomous entities capable of perceiving and interacting with their environments through vision and language. Multi-agent systems comprise specialized agents who collaborate to solve a (complex) task. A core security property is robustness, stating that the system should maintain its integrity under adversarial attacks. However, the design of existing multi-agent systems lacks the robustness consideration, as a successful exploit against one agent can spread and infect other agents to undermine the entire system's assurance. To address this, we propose a new defense approach, Cowpox, to provably enhance the robustness of multi-agent systems. It incorporates a distributed mechanism, which improves the recovery rate of agents by limiting the expected number of infections to other agents. The core idea is to generate and distribute a special cure sample that immunizes an agent against the attack before exposure and helps recover the already infected agents. We demonstrate the effectiveness of Cowpox empirically and provide theoretical robustness guarantees.

Key Contributions

Cowpox: a distributed defense mechanism that generates and propagates 'cure samples' to immunize VLM agents before exposure and recover already-infected agents in a multi-agent system
Formal modeling of infectious attack spread in multi-agent systems with theoretical robustness guarantees bounding the expected number of agent infections
Empirical demonstration of effectiveness against infectious jailbreak attacks with certified defense guarantees against adaptive adversaries

🛡️ Threat Analysis

Details

Domains

multimodalnlp

Model Types

vlmmultimodalllm

Threat Tags

inference_timeblack_box

Applications

vlm-based multi-agent systemsautonomous agent pipelines

Read PDF arXiv Code

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels

Enhancing Reliability in LLM-Integrated Robotic Systems: A Unified Approach to Security and Safety

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents