defense 2026

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Yanpei Guo 1, Wenjie Qu 1, Linyu Wu 1, Shengfang Zhai 1, Lionel Z. Wang 2, Ming Xu 1, Yue Liu 1, Binhang Yuan , Dawn Song 3, Jiaheng Zhang 1

0 citations

α

Published on arXiv

2602.22700

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

IMMACULATE reliably detects model substitution, quantization abuse, and token overbilling with under 1% throughput overhead on both dense and MoE LLMs

IMMACULATE

Novel technique introduced


Commercial large language models are typically deployed as black-box API services, requiring users to trust providers to execute inference correctly and report token usage honestly. We present IMMACULATE, a practical auditing framework that detects economically motivated deviations-such as model substitution, quantization abuse, and token overbilling-without trusted hardware or access to model internals. IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead. Experiments on dense and MoE models show that IMMACULATE reliably distinguishes benign and malicious executions with under 1% throughput overhead. Our code is published at https://github.com/guo-yanpei/Immaculate.


Key Contributions

  • Threat model formalizing economically motivated LLM provider deviations: model substitution, quantization abuse, and token overbilling
  • Selective auditing protocol using verifiable computation that amortizes cryptographic overhead across requests to achieve under 1% throughput overhead
  • Empirical demonstration on dense and MoE LLMs that IMMACULATE reliably distinguishes benign from malicious executions without trusted hardware or model internals

🛡️ Threat Analysis

Output Integrity Attack

IMMACULATE is a verifiable inference scheme that detects when LLM API outputs have been tampered with or produced dishonestly (wrong model, wrong precision, fraudulent token counts). ML09 explicitly includes 'verifiable inference schemes (proving outputs weren't tampered with)' — this is the paper's core contribution.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_time
Applications
llm api servicescommercial llm inferencellm billing verification