Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models
Published on arXiv
2601.00848
Excessive Agency
OWASP LLM Top 10 — LLM08
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Accuracy improves from 42.86% to 74.29% (31.4-point gain, p < 0.001) through three QLoRA fine-tuning iterations with targeted adversarial augmentation on resource-constrained ARM64 hardware
QLoRA iterative fine-tuning on Foundation-Sec-8B
Novel technique introduced
We present an openly documented methodology for fine-tuning language models to detect temporal attack patterns in multi-agent AI workflows using OpenTelemetry trace analysis. We curate a dataset of 80,851 examples from 18 public cybersecurity sources and 35,026 synthetic OpenTelemetry traces. We apply iterative QLoRA fine-tuning on resource-constrained ARM64 hardware (NVIDIA DGX Spark) through three training iterations with strategic augmentation. Our custom benchmark accuracy improves from 42.86% to 74.29%, a statistically significant 31.4-point gain. Targeted examples addressing specific knowledge gaps outperform indiscriminate scaling. Key contributions include: (1) synthetic trace generation methodology for multi-agent coordination attacks and regulatory violations, (2) empirical evidence that training data composition fundamentally determines behavior, and (3) complete open release of datasets, training scripts, and evaluation benchmarks on HuggingFace. While practical deployment requires human oversight due to false positive rates, this work establishes the first reproducible framework enabling practitioners to build custom agentic security models adapted to their threat landscapes.
Key Contributions
- First open methodology for fine-tuning LLMs on agentic workflow security using OpenTelemetry trace analysis, with all datasets, scripts, and benchmarks released on HuggingFace
- Synthetic trace generation methodology producing 35,026 realistic multi-agent attack traces covering coordination attacks, stealth evasion, and regulatory violations (GDPR, HIPAA, PCI-DSS)
- Empirical evidence that training data composition determines model behavior — 90% attack-focused datasets produce 66.7% false positive rates resistant to prompt engineering, requiring architectural remediation