defense 2026

SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

Nirhoshan Sivaroopan , Kanchana Thilakarathna , Albert Zomaya , Manu , Yi Guo , Jo Plested 1, Tim Lynar 1, Jack Yang 2, Wangli Yang 2

0 citations · 32 references · arXiv

α

Published on arXiv

2601.19174

Model Denial of Service

OWASP LLM Top 10 — LLM04

Key Finding

SHIELD achieves high F1 scores against both non-semantic and semantic sponge attacks, consistently outperforming perplexity-based and standalone LLM-based defenses.

SHIELD

Novel technique introduced


Sponge attacks increasingly threaten LLM systems by inducing excessive computation and DoS. Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve. We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent that integrates semantic similarity retrieval, pattern matching, and LLM-based reasoning. Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection, the system updates an evolving knowledgebase, and refines defense instructions. Extensive experiments show that SHIELD consistently outperforms perplexity-based and standalone LLM defenses, achieving high F1 scores across both non-semantic and semantic sponge attacks, demonstrating the effectiveness of agentic self-healing against evolving resource-exhaustion threats.


Key Contributions

  • Three-stage Defense Agent combining semantic similarity retrieval, pattern matching, and LLM-based reasoning to detect both non-semantic and semantic sponge attacks
  • Self-healing agentic loop: a Knowledge Updating Agent and Prompt Optimization Agent automatically update the knowledgebase and refine defense instructions when attacks bypass detection
  • Empirical demonstration that SHIELD outperforms perplexity-based and standalone LLM defenses across diverse sponge attack types

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timeblack_box
Applications
llm api serviceschatbotsllm inference infrastructure