tool 2025

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Xiangzhe Xu , Guangyu Shen , Zian Su , Siyuan Cheng , Hanxi Guo , Lu Yan , Xuan Chen , Jiasheng Jiang , Xiaolong Jin , Chengpeng Wang , Zhuo Zhang , Xiangyu Zhang

Purdue University

0 citations

Published on arXiv

2508.03936

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

ASTRA discovers 11–66% more safety issues than existing red-teaming techniques and produces alignment training data that is 17% more effective than baselines

ASTRA

Novel technique introduced

AI coding assistants like GitHub Copilot are rapidly transforming software development, but their safety remains deeply uncertain-especially in high-stakes domains like cybersecurity. Current red-teaming tools often rely on fixed benchmarks or unrealistic prompts, missing many real-world vulnerabilities. We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in three stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs; and (3) it generates high-quality violation-inducing cases to improve model alignment. Unlike prior methods, ASTRA focuses on realistic inputs-requests that developers might actually ask-and uses both offline abstraction guided domain modeling and online domain knowledge graph adaptation to surface corner-case vulnerabilities. Across two major evaluation domains, ASTRA finds 11-66% more issues than existing techniques and produces test cases that lead to 17% more effective alignment training, showing its practical value for building safer AI systems.

Key Contributions

Domain-specific knowledge graph construction modeling complex software tasks and known weaknesses for structured red-teaming
Adaptive online vulnerability exploration combining spatial (input space) and temporal (reasoning process) probing of target LLMs
Automated generation of high-quality violation-inducing test cases that improve alignment training effectiveness by 17%

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Applications

ai coding assistantscode generationsecurity guidance systems

Read PDF arXiv

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Quantifying Document Impact in RAG-LLMs

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Proactive Hardening of LLM Defenses with HASTE

MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support

CALM: Curiosity-Driven Auditing for Large Language Models

SGuard-v1: Safety Guardrail for Large Language Models