defense 2026

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

Om Tailor

University of Maryland

0 citations

Published on arXiv

2603.00381

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

In strict verifier-bound lanes, decoder advantage is certified at 0.0000 with MI proxy 0.0636 bits, demonstrating that online fail-closed admission semantics provide qualitatively stronger covert-channel bounds than audit-only or bottleneck-only controls.

CLBC (Certified Leakage-Bounded Communication)

Novel technique introduced

Colluding language-model agents can hide coordination in messages that remain policy-compliant at the surface level. We present CLBC, a protocol where generation and admission are separated: a message is admitted to transcript state only if a small verifier accepts a proof-bound envelope under a pinned predicate $Π$. The predicate binds policy hash, public randomness schedule, transcript chaining, latent schema constraints, canonical metadata/tool fields, and deterministic rejection codes. We show how this protocol yields an upper bound on transcript leakage in terms of latent leakage plus explicit residual channels, derive adaptive composition guarantees, and state a semantic lower bound when policy-valid alternatives remain choosable. We report extensive empirically grounded evidence: aggregate evaluation satisfies all prespecified thresholds; strict lane decoder advantage is bounded at 0.0000 with MI proxy 0.0636; adaptive-colluder stress tests remain below attacker thresholds; and baseline separation shows large gaps between reject-by-default semantics and audit-only controls. We further quantify operational tradeoffs. Strict full-proof mode has median turn latency 27.53s (p95 28.08s), while sampled proving reduces non-proved-turn latency to 0.327ms. The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.

Key Contributions

CLBC protocol that separates message generation from admission: messages are only admitted to transcript state if a verifier accepts a proof-bound envelope under a pinned predicate binding policy hash, randomness schedule, transcript chaining, latent schema constraints, and tool/metadata fields
Formal upper bound on transcript leakage as latent leakage plus explicit residual channels, with adaptive composition guarantees and a semantic lower bound for tasks with multiple policy-valid alternatives
Empirical evaluation showing near-zero decoder advantage (0.0000) with MI proxy 0.0636 bits in strict lanes, alongside latency characterization (median 27.53s full-proof vs 0.327ms sampled proving)

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Applications

llm multi-agent systemsagentic ai frameworkstool-augmented llm pipelines

Read PDF arXiv

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Password-Activated Shutdown Protocols for Misaligned Frontier Agents

How does information access affect LLM monitors' ability to detect sabotage?

Throttling Web Agents Using Reasoning Gates

Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI

Basic Legibility Protocols Improve Trusted Monitoring

Agentic Misalignment: How LLMs Could Be Insider Threats

NEST: Nascent Encoded Steganographic Thoughts