Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis
Published on arXiv
2510.05106
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Hot reloading of formally verified rule sets provably increases the asymptotic probability of compliant outputs, and rule formats with low syntactic entropy improve pointer fidelity but reveal a redundancy-entropy trade-off.
Dynamic Rule Verification with Hot Reloading
Novel technique introduced
The design of safety-critical agents based on large language models (LLMs) requires more than simple prompt engineering. This paper presents a comprehensive information-theoretic analysis of how rule encodings in system prompts influence attention mechanisms and compliance behaviour. We demonstrate that rule formats with low syntactic entropy and highly concentrated anchors reduce attention entropy and improve pointer fidelity, but reveal a fundamental trade-off between anchor redundancy and attention entropy that previous work failed to recognize. Through formal analysis of multiple attention architectures including causal, bidirectional, local sparse, kernelized, and cross-attention mechanisms, we establish bounds on pointer fidelity and show how anchor placement strategies must account for competing fidelity and entropy objectives. Combining these insights with a dynamic rule verification architecture, we provide a formal proof that hot reloading of verified rule sets increases the asymptotic probability of compliant outputs. These findings underscore the necessity of principled anchor design and dual enforcement mechanisms to protect LLM-based agents against prompt injection attacks while maintaining compliance in evolving domains.
Key Contributions
- Formal information-theoretic analysis of how rule encoding formats (syntactic entropy, anchor placement) affect attention entropy and pointer fidelity across multiple attention architectures
- Identification of a fundamental trade-off between anchor redundancy and attention entropy that prior work missed, with established bounds on pointer fidelity
- Dynamic rule verification architecture with hot reloading of verified rule sets, with formal proof that this increases asymptotic probability of compliant LLM outputs