Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains
Published on arXiv
2603.21149
Output Integrity Attack
OWASP ML Top 10 — ML09
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Achieves 100% classification accuracy with zero false positives/negatives across 181 test cases in five domains, detecting real bugs including INT_MIN overflow and formally unverifiable parameters
substrate-guard
Novel technique introduced
An autonomous AI ecosystem (SUBSTRATE S3), generating product specifications without explicit instructions about formal methods, independently proposed the use of Z3 SMT solver across six distinct domains of AI safety: verification of LLM-generated code, tool API safety for AI agents, post-distillation reasoning correctness, CLI command validation, hardware assembly verification, and smart contract safety. These convergent discoveries, occurring across 8 products over 13 days with Jaccard similarity below 15% between variants, suggest that formal verification is not merely a useful technique for AI safety but an emergent property of any sufficiently complex system reasoning about its own safety. We propose a unified framework (substrate-guard) that applies Z3-based verification across all six output classes through a common API, and evaluate it on 181 test cases across five implemented domains, achieving 100% classification accuracy with zero false positives and zero false negatives. Our framework detected real bugs that empirical testing would miss, including an INT_MIN overflow in branchless RISC-V assembly and mathematically proved that unconstrained string parameters in tool APIs are formally unverifiable.
Key Contributions
- Unified formal verification framework (substrate-guard) applying Z3 SMT solver across six AI safety domains
- Empirical evidence that formal verification emerges independently as a safety property in autonomous AI systems
- 100% classification accuracy on 181 test cases with detection of bugs missed by empirical testing
🛡️ Threat Analysis
The paper addresses output integrity by formally verifying the correctness and safety of AI-generated outputs (code, commands, contracts, assembly) to prevent tampering or incorrect execution. The Z3-based verification ensures outputs meet safety constraints before deployment.