defense 2026

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li ¹, Zehao Liu ¹, Xi Lin ¹, Qinghua Mao ¹, Yuliang Chen ¹, Haoyu Li ², Jun Wu ¹, Jianhua Li ¹, Xiu Su ³

¹ Shanghai Jiao Tong University

² University of Illinois Urbana-Champaign

³ Central South University

0 citations

Published on arXiv

2604.04060

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Reduces attack success rate by 78.9%, improves deceptive rate by 186%, and reduces attack efficiency by 167.9% compared to state-of-the-art defenses on EMRA benchmark

CoopGuard

Novel technique introduced

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety concerns, especially those evolving over multi-round interactions. Existing defenses are largely reactive and struggle to adapt as adversaries refine strategies across rounds. In this work, we propose CoopGuard , a stateful multi-round LLM defense framework based on cooperative agents that maintains and updates an internal defense state to counter evolving attacks. It employs three specialized agents (Deferring Agent, Tempting Agent, and Forensic Agent) for complementary round-level strategies, coordinated by System Agent, which conditions decisions on the evolving defense state (interaction history) and orchestrates agents over time. To evaluate evolving threats, we introduce the EMRA benchmark with 5,200 adversarial samples across 8 attack types, simulating progressively LLM multi-round attacks. Experiments show that CoopGuard reduces attack success rate by 78.9% over state-of-the-art defenses, while improving deceptive rate by 186% and reducing attack efficiency by 167.9%, offering a more comprehensive assessment of multi-round defense. These results demonstrate that CoopGuard provides robust protection for LLMs in multi-round adversarial scenarios.

Key Contributions

CoopGuard: stateful multi-agent defense framework that maintains evolving defense state across rounds to counter adaptive jailbreak attacks
EMRA benchmark with 5,200 adversarial samples across 8 attack types simulating escalating multi-round attacks
Three-metric evaluation (attack success rate, deceptive rate, attack efficiency) demonstrating 78.9% ASR reduction, 186% DR improvement, and 167.9% AE reduction over state-of-the-art

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

EMRA

Applications

llm safetychatbot securitymulti-round dialogue systems

Read PDF arXiv

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI

Cybersecurity AI: Hacking the AI Hackers via Prompt Injection

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

ceLLMate: Sandboxing Browser AI Agents

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection