IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Commercial large language models are typically deployed as black-box API services, requiring users to trust providers to execute inference correctly and report token usage honestly. We present IMMACULATE, a practical auditing framework that detects economically motivated deviations-such as model substitution, quantization abuse, and token overbilling-without trusted hardware or access to model internals. IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead. Experiments on dense and MoE models show that IMMACULATE reliably distinguishes benign and malicious executions with under 1% throughput overhead. Our code is published at https://github.com/guo-yanpei/Immaculate.

Key Contributions

Threat model formalizing economically motivated LLM provider deviations: model substitution, quantization abuse, and token overbilling
Selective auditing protocol using verifiable computation that amortizes cryptographic overhead across requests to achieve under 1% throughput overhead
Empirical demonstration on dense and MoE LLMs that IMMACULATE reliably distinguishes benign from malicious executions without trusted hardware or model internals

🛡️ Threat Analysis

Output Integrity Attack

IMMACULATE is a verifiable inference scheme that detects when LLM API outputs have been tampered with or produced dishonestly (wrong model, wrong precision, fraudulent token counts). ML09 explicitly includes 'verifiable inference schemes (proving outputs weren't tampered with)' — this is the paper's core contribution.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

2025 0 cit.

Output Integrity Attack

100%

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection

Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

DAMAGE: Detecting Adversarially Modified AI Generated Text

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

Simplex-Optimized Hybrid Ensemble for Large Language Model Text Detection Under Generative Distribution Drif

Trace Is In Sentences: Unbiased Lightweight ChatGPT-Generated Text Detector

Watermarks for Language Models via Probabilistic Automata