tool 2025

Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models

Ron F. Del Rosario ^1,2

¹ SAP

² OWASP

0 citations · 40 references · arXiv

Published on arXiv

2601.00848

Excessive Agency

OWASP LLM Top 10 — LLM08

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Accuracy improves from 42.86% to 74.29% (31.4-point gain, p < 0.001) through three QLoRA fine-tuning iterations with targeted adversarial augmentation on resource-constrained ARM64 hardware

QLoRA iterative fine-tuning on Foundation-Sec-8B

Novel technique introduced

We present an openly documented methodology for fine-tuning language models to detect temporal attack patterns in multi-agent AI workflows using OpenTelemetry trace analysis. We curate a dataset of 80,851 examples from 18 public cybersecurity sources and 35,026 synthetic OpenTelemetry traces. We apply iterative QLoRA fine-tuning on resource-constrained ARM64 hardware (NVIDIA DGX Spark) through three training iterations with strategic augmentation. Our custom benchmark accuracy improves from 42.86% to 74.29%, a statistically significant 31.4-point gain. Targeted examples addressing specific knowledge gaps outperform indiscriminate scaling. Key contributions include: (1) synthetic trace generation methodology for multi-agent coordination attacks and regulatory violations, (2) empirical evidence that training data composition fundamentally determines behavior, and (3) complete open release of datasets, training scripts, and evaluation benchmarks on HuggingFace. While practical deployment requires human oversight due to false positive rates, this work establishes the first reproducible framework enabling practitioners to build custom agentic security models adapted to their threat landscapes.

Key Contributions

First open methodology for fine-tuning LLMs on agentic workflow security using OpenTelemetry trace analysis, with all datasets, scripts, and benchmarks released on HuggingFace
Synthetic trace generation methodology producing 35,026 realistic multi-agent attack traces covering coordination attacks, stealth evasion, and regulatory violations (GDPR, HIPAA, PCI-DSS)
Empirical evidence that training data composition determines model behavior — 90% attack-focused datasets produce 66.7% false positive rates resistant to prompt engineering, requiring architectural remediation

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

AgentHarmAgent-SafetyBenchPKU-SafeRLHFBeaverTailsHaluEvalTruthfulQAsynthetic OpenTelemetry traces

Applications

multi-agent ai workflow securityagentic system monitoringllm agent anomaly detection

Read PDF arXiv DOI

Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

AprielGuard

AJAR: Adaptive Jailbreak Architecture for Red-teaming

DREAM: Dynamic Red-teaming across Environments for AI Models

VeriGrey: Greybox Agent Validation

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

AgentSight: System-Level Observability for AI Agents Using eBPF

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors