defense 2026

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

0 citations

Published on arXiv

2604.24020

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Blue-Team Agents

LLMs for Security — LS07

Key Finding

Weakest-first ASAT raises average security awareness score from 80.9 to 96.9 (+15.9 points) over 16 sessions, covering 11/12 threat dimensions, outperforming uniform-random scheduling by 6.5 points

ClawdGo

Novel technique introduced

Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving the agent's own threat judgement entirely untrained. We present ClawdGo, a framework for endogenous security awareness training: we teach the agent to recognise and reason about threats from the inside, at inference time, with no model modification. Four contributions are introduced: TLDT (Three-Layer Domain Taxonomy) organises 12 trainable dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers; ASAT (Autonomous Security Awareness Training) is a self-play loop where the agent alternates attacker, defender, and evaluator roles under weakest-first curriculum scheduling; CSMA (Cross-Session Memory Accumulation) compounds skill gains via a four-layer persistent memory architecture and Axiom Crystallisation Promotion (ACP); and SACP (Security Awareness Calibration Problem) formalises the precision-recall tradeoff introduced by endogenous training. Live experiments show weakest-first ASAT raises average TLDT score from 80.9 to 96.9 over 16 sessions, outperforming uniform-random scheduling by 6.5 points and covering 11 of 12 dimensions. CSMA retains the full gain across sessions; cold-start ablation recovers only 2.4 points, leaving a 13.6-point gap. E-mode generates 32 TLDT-conformant scenarios covering all 12 dimensions. SACP is observed when a heavily trained agent classifies a legitimate capability assessment as prompt injection (30/160).

Key Contributions

TLDT taxonomy organizing 12 security dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers for AI agents
ASAT self-play training loop where agent alternates attacker/defender/evaluator roles under weakest-first curriculum scheduling
CSMA persistent memory architecture with Axiom Crystallisation Promotion enabling cross-session security skill accumulation without model fine-tuning
SACP formalization of precision-recall tradeoff in endogenous security training

🛡️ Threat Analysis

AI Supply Chain Attacks

Paper explicitly addresses supply-chain attacks on AI agents via malicious skills/packages distributed through ClawHub and skills.sh repositories (76 confirmed malicious payloads in 3,984 packages). The TLDT S3 dimension targets supply-chain threat awareness.

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timetraining_time

Datasets

OpenClaw platform instancesClawHub/skills.sh package scan (3,984 packages)

Applications

autonomous ai agentsllm agent securityenterprise ai deployment

Read PDF arXiv

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications

Optimizing Agent Planning for Security and Autonomy

Policy Compiler for Secure Agentic Systems

A2AS: Agentic AI Runtime Security and Self-Defense

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability