Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
Yutong Wu 1, Jie Zhang 2, Yiming Li 1, Chao Zhang 3, Qing Guo 2, Nils Lukas 4, Tianwei Zhang 1
Published on arXiv
2508.09230
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Cowpox provably limits the expected number of agent infections in a VLM-based multi-agent system and empirically reduces jailbreak propagation, providing certified robustness against adaptive attacks.
Cowpox
Novel technique introduced
Vision Language Model (VLM)-based agents are stateful, autonomous entities capable of perceiving and interacting with their environments through vision and language. Multi-agent systems comprise specialized agents who collaborate to solve a (complex) task. A core security property is robustness, stating that the system should maintain its integrity under adversarial attacks. However, the design of existing multi-agent systems lacks the robustness consideration, as a successful exploit against one agent can spread and infect other agents to undermine the entire system's assurance. To address this, we propose a new defense approach, Cowpox, to provably enhance the robustness of multi-agent systems. It incorporates a distributed mechanism, which improves the recovery rate of agents by limiting the expected number of infections to other agents. The core idea is to generate and distribute a special cure sample that immunizes an agent against the attack before exposure and helps recover the already infected agents. We demonstrate the effectiveness of Cowpox empirically and provide theoretical robustness guarantees.
Key Contributions
- Cowpox: a distributed defense mechanism that generates and propagates 'cure samples' to immunize VLM agents before exposure and recover already-infected agents in a multi-agent system
- Formal modeling of infectious attack spread in multi-agent systems with theoretical robustness guarantees bounding the expected number of agent infections
- Empirical demonstration of effectiveness against infectious jailbreak attacks with certified defense guarantees against adaptive adversaries