attack 2026

STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

MinJae Jung , YongTaek Lim , Chaeyun Kim , Junghwan Kim , Kihyun Kim , Minwoo Kim

0 citations

α

Published on arXiv

2604.18976

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves significantly higher attack success rate (ASR) than SOTA baselines at lower computational cost through community-guided strategy sampling

STAR-Teaming

Novel technique introduced


While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses. This paper introduces STAR-Teaming, a novel black-box framework for automated red teaming that effectively generates such prompts. STAR-Teaming integrates a Multi-Agent System (MAS) with a Strategy-Response Multiplex Network and employs network-driven optimization to sample effective attack strategies. This network-based approach recasts the intractable high-dimensional embedding space into a tractable structure, yielding two key advantages: it enhances the interpretability of the LLM's strategic vulnerabilities, and it streamlines the search for effective strategies by organizing the search space into semantic communities, thereby preventing redundant exploration. Empirical results demonstrate that STAR-Teaming significantly surpasses existing methods, achieving a higher attack success rate (ASR) at a lower computational cost. Extensive experiments validate the effectiveness and explainability of the Multiplex Network. The code is available at https://github.com/selectstar-ai/STAR-Teaming-paper.


Key Contributions

  • Novel multiplex network framework that maps attack strategies to LLM response patterns for interpretable vulnerability discovery
  • Network-driven optimization for efficient strategy sampling organized into semantic communities, avoiding redundant exploration
  • Multi-agent system achieving higher attack success rate at lower computational cost than existing methods

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timeuntargeted
Applications
llm safety evaluationautomated red teaming