Trusting What You Cannot See: Auditable Fine-Tuning and Inference for Proprietary AI

Cloud-based infrastructures have become the dominant platform for deploying large models, particularly large language models (LLMs). Fine-tuning and inference are increasingly delegated to cloud providers for simplified deployment and access to proprietary models, yet this creates a fundamental trust gap: although cryptographic and TEE-based verification exist, the scale of modern LLMs renders them prohibitive, leaving clients unable to practically audit these processes. This lack of transparency creates concrete security risks that can silently compromise service integrity. We present AFTUNE, an auditable and verifiable framework that ensures the computation integrity of cloud-based fine-tuning and inference. AFTUNE incorporates a lightweight recording and spot-check mechanism that produces verifiable traces of execution. These traces enable clients to later audit whether the training and inference processes followed the agreed configurations. Our evaluation shows that AFTUNE imposes practical computation overhead while enabling selective and efficient verification, demonstrating that trustworthy model services are achievable in today's cloud environments.

Key Contributions

AFTUNE: a lightweight recording and spot-check mechanism that produces verifiable execution traces for cloud-based LLM fine-tuning and inference
Enables clients to audit whether cloud providers faithfully executed agreed training and inference configurations without prohibitive cryptographic overhead
Demonstrates practical overhead while achieving selective and efficient verification of cloud LLM services

🛡️ Threat Analysis

Output Integrity Attack

AFTUNE provides verifiable inference schemes and auditable execution traces — exactly what ML09 covers under 'verifiable inference schemes (proving outputs weren't tampered with)'. The threat is a dishonest cloud provider silently deviating from contracted computations, producing outputs that don't reflect the agreed model or training process. The defense is cryptographic trace-based verification of both inference outputs and fine-tuning execution.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timeinference_timegrey_box

Applications

2025 0 cit.

Output Integrity Attack

82%

Trusting What You Cannot See: Auditable Fine-Tuning and Inference for Proprietary AI

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

ArcMark: Multi-bit LLM Watermark via Optimal Transport

MGT-Prism: Enhancing Domain Generalization for Machine-Generated Text Detection via Spectral Alignment

MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models

StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis

Optimal Detection for Language Watermarks with Pseudorandom Collision