defense 2026

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Octavian Untila

Aisophical SRL

0 citations

Published on arXiv

2603.21149

Output Integrity Attack

OWASP ML Top 10 — ML09

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Achieves 100% classification accuracy with zero false positives/negatives across 181 test cases in five domains, detecting real bugs including INT_MIN overflow and formally unverifiable parameters

substrate-guard

Novel technique introduced

An autonomous AI ecosystem (SUBSTRATE S3), generating product specifications without explicit instructions about formal methods, independently proposed the use of Z3 SMT solver across six distinct domains of AI safety: verification of LLM-generated code, tool API safety for AI agents, post-distillation reasoning correctness, CLI command validation, hardware assembly verification, and smart contract safety. These convergent discoveries, occurring across 8 products over 13 days with Jaccard similarity below 15% between variants, suggest that formal verification is not merely a useful technique for AI safety but an emergent property of any sufficiently complex system reasoning about its own safety. We propose a unified framework (substrate-guard) that applies Z3-based verification across all six output classes through a common API, and evaluate it on 181 test cases across five implemented domains, achieving 100% classification accuracy with zero false positives and zero false negatives. Our framework detected real bugs that empirical testing would miss, including an INT_MIN overflow in branchless RISC-V assembly and mathematically proved that unconstrained string parameters in tool APIs are formally unverifiable.

Key Contributions

Unified formal verification framework (substrate-guard) applying Z3 SMT solver across six AI safety domains
Empirical evidence that formal verification emerges independently as a safety property in autonomous AI systems
100% classification accuracy on 181 test cases with detection of bugs missed by empirical testing

🛡️ Threat Analysis

Output Integrity Attack

The paper addresses output integrity by formally verifying the correctness and safety of AI-generated outputs (code, commands, contracts, assembly) to prevent tampering or incorrect execution. The Z3-based verification ensures outputs meet safety constraints before deployment.

Details

Domains

nlpmultimodal

Model Types

llm

Threat Tags

inference_time

Applications

llm code generationai agent tool usesmart contract generationcli command validationhardware assembly verification

Read PDF arXiv

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents

Tracking Capabilities for Safer Agents

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Securing AI Agent Execution