Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents
Jiaqi Li 1,2, Yang Zhao 1,2, Bin Sun 3, Yang Yu 4, Jian Chang 5, Lidong Zhai 1,2
Published on arXiv
2604.24020
AI Supply Chain Attacks
OWASP ML Top 10 — ML06
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Blue-Team Agents
LLMs for Security — LS07
Key Finding
Weakest-first ASAT raises average security awareness score from 80.9 to 96.9 (+15.9 points) over 16 sessions, covering 11/12 threat dimensions, outperforming uniform-random scheduling by 6.5 points
ClawdGo
Novel technique introduced
Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving the agent's own threat judgement entirely untrained. We present ClawdGo, a framework for endogenous security awareness training: we teach the agent to recognise and reason about threats from the inside, at inference time, with no model modification. Four contributions are introduced: TLDT (Three-Layer Domain Taxonomy) organises 12 trainable dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers; ASAT (Autonomous Security Awareness Training) is a self-play loop where the agent alternates attacker, defender, and evaluator roles under weakest-first curriculum scheduling; CSMA (Cross-Session Memory Accumulation) compounds skill gains via a four-layer persistent memory architecture and Axiom Crystallisation Promotion (ACP); and SACP (Security Awareness Calibration Problem) formalises the precision-recall tradeoff introduced by endogenous training. Live experiments show weakest-first ASAT raises average TLDT score from 80.9 to 96.9 over 16 sessions, outperforming uniform-random scheduling by 6.5 points and covering 11 of 12 dimensions. CSMA retains the full gain across sessions; cold-start ablation recovers only 2.4 points, leaving a 13.6-point gap. E-mode generates 32 TLDT-conformant scenarios covering all 12 dimensions. SACP is observed when a heavily trained agent classifies a legitimate capability assessment as prompt injection (30/160).
Key Contributions
- TLDT taxonomy organizing 12 security dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers for AI agents
- ASAT self-play training loop where agent alternates attacker/defender/evaluator roles under weakest-first curriculum scheduling
- CSMA persistent memory architecture with Axiom Crystallisation Promotion enabling cross-session security skill accumulation without model fine-tuning
- SACP formalization of precision-recall tradeoff in endogenous security training
🛡️ Threat Analysis
Paper explicitly addresses supply-chain attacks on AI agents via malicious skills/packages distributed through ClawHub and skills.sh repositories (76 confirmed malicious payloads in 3,984 packages). The TLDT S3 dimension targets supply-chain threat awareness.