defense 2026

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Octavian Untila

0 citations

α

Published on arXiv

2603.21149

Output Integrity Attack

OWASP ML Top 10 — ML09

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Achieves 100% classification accuracy with zero false positives/negatives across 181 test cases in five domains, detecting real bugs including INT_MIN overflow and formally unverifiable parameters

substrate-guard

Novel technique introduced


An autonomous AI ecosystem (SUBSTRATE S3), generating product specifications without explicit instructions about formal methods, independently proposed the use of Z3 SMT solver across six distinct domains of AI safety: verification of LLM-generated code, tool API safety for AI agents, post-distillation reasoning correctness, CLI command validation, hardware assembly verification, and smart contract safety. These convergent discoveries, occurring across 8 products over 13 days with Jaccard similarity below 15% between variants, suggest that formal verification is not merely a useful technique for AI safety but an emergent property of any sufficiently complex system reasoning about its own safety. We propose a unified framework (substrate-guard) that applies Z3-based verification across all six output classes through a common API, and evaluate it on 181 test cases across five implemented domains, achieving 100% classification accuracy with zero false positives and zero false negatives. Our framework detected real bugs that empirical testing would miss, including an INT_MIN overflow in branchless RISC-V assembly and mathematically proved that unconstrained string parameters in tool APIs are formally unverifiable.


Key Contributions

  • Unified formal verification framework (substrate-guard) applying Z3 SMT solver across six AI safety domains
  • Empirical evidence that formal verification emerges independently as a safety property in autonomous AI systems
  • 100% classification accuracy on 181 test cases with detection of bugs missed by empirical testing

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses output integrity by formally verifying the correctness and safety of AI-generated outputs (code, commands, contracts, assembly) to prevent tampering or incorrect execution. The Z3-based verification ensures outputs meet safety constraints before deployment.


Details

Domains
nlpmultimodal
Model Types
llm
Threat Tags
inference_time
Applications
llm code generationai agent tool usesmart contract generationcli command validationhardware assembly verification