defense 2026

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Jiaqi Li 1,2, Yang Zhao 1,2, Bin Sun 3, Yang Yu 4, Jian Chang 5, Lidong Zhai 1,2

0 citations

α

Published on arXiv

2604.24020

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Blue-Team Agents

LLMs for Security — LS07

Key Finding

Weakest-first ASAT raises average security awareness score from 80.9 to 96.9 (+15.9 points) over 16 sessions, covering 11/12 threat dimensions, outperforming uniform-random scheduling by 6.5 points

ClawdGo

Novel technique introduced


Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving the agent's own threat judgement entirely untrained. We present ClawdGo, a framework for endogenous security awareness training: we teach the agent to recognise and reason about threats from the inside, at inference time, with no model modification. Four contributions are introduced: TLDT (Three-Layer Domain Taxonomy) organises 12 trainable dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers; ASAT (Autonomous Security Awareness Training) is a self-play loop where the agent alternates attacker, defender, and evaluator roles under weakest-first curriculum scheduling; CSMA (Cross-Session Memory Accumulation) compounds skill gains via a four-layer persistent memory architecture and Axiom Crystallisation Promotion (ACP); and SACP (Security Awareness Calibration Problem) formalises the precision-recall tradeoff introduced by endogenous training. Live experiments show weakest-first ASAT raises average TLDT score from 80.9 to 96.9 over 16 sessions, outperforming uniform-random scheduling by 6.5 points and covering 11 of 12 dimensions. CSMA retains the full gain across sessions; cold-start ablation recovers only 2.4 points, leaving a 13.6-point gap. E-mode generates 32 TLDT-conformant scenarios covering all 12 dimensions. SACP is observed when a heavily trained agent classifies a legitimate capability assessment as prompt injection (30/160).


Key Contributions

  • TLDT taxonomy organizing 12 security dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers for AI agents
  • ASAT self-play training loop where agent alternates attacker/defender/evaluator roles under weakest-first curriculum scheduling
  • CSMA persistent memory architecture with Axiom Crystallisation Promotion enabling cross-session security skill accumulation without model fine-tuning
  • SACP formalization of precision-recall tradeoff in endogenous security training

🛡️ Threat Analysis

AI Supply Chain Attacks

Paper explicitly addresses supply-chain attacks on AI agents via malicious skills/packages distributed through ClawHub and skills.sh repositories (76 confirmed malicious payloads in 3,984 packages). The TLDT S3 dimension targets supply-chain threat awareness.


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_timetraining_time
Datasets
OpenClaw platform instancesClawHub/skills.sh package scan (3,984 packages)
Applications
autonomous ai agentsllm agent securityenterprise ai deployment