α

Published on arXiv

2508.09230

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Cowpox provably limits the expected number of agent infections in a VLM-based multi-agent system and empirically reduces jailbreak propagation, providing certified robustness against adaptive attacks.

Cowpox

Novel technique introduced


Vision Language Model (VLM)-based agents are stateful, autonomous entities capable of perceiving and interacting with their environments through vision and language. Multi-agent systems comprise specialized agents who collaborate to solve a (complex) task. A core security property is robustness, stating that the system should maintain its integrity under adversarial attacks. However, the design of existing multi-agent systems lacks the robustness consideration, as a successful exploit against one agent can spread and infect other agents to undermine the entire system's assurance. To address this, we propose a new defense approach, Cowpox, to provably enhance the robustness of multi-agent systems. It incorporates a distributed mechanism, which improves the recovery rate of agents by limiting the expected number of infections to other agents. The core idea is to generate and distribute a special cure sample that immunizes an agent against the attack before exposure and helps recover the already infected agents. We demonstrate the effectiveness of Cowpox empirically and provide theoretical robustness guarantees.


Key Contributions

  • Cowpox: a distributed defense mechanism that generates and propagates 'cure samples' to immunize VLM agents before exposure and recover already-infected agents in a multi-agent system
  • Formal modeling of infectious attack spread in multi-agent systems with theoretical robustness guarantees bounding the expected number of agent infections
  • Empirical demonstration of effectiveness against infectious jailbreak attacks with certified defense guarantees against adaptive adversaries

🛡️ Threat Analysis


Details

Domains
multimodalnlp
Model Types
vlmmultimodalllm
Threat Tags
inference_timeblack_box
Applications
vlm-based multi-agent systemsautonomous agent pipelines