attack 2026

Aggressive Compression Enables LLM Weight Theft

Davis Brown ¹, Juan-Pablo Rivera ², Dan Hendrycks ³, Mantas Mazeika ³

¹ University of Pennsylvania

² Georgia Institute of Technology

³ Center for AI Safety

0 citations · 41 references · arXiv

Published on arXiv

2601.01296

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Aggressive compression reduces LLM weight exfiltration time from months to days (16x–100x compression ratio), and forensic watermarking is identified as the most practical and effective post-exfiltration defense.

As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltration attacks where an adversary attempts to sneak model weights out of a datacenter over a network. While exfiltration attacks are multi-step cyber attacks, we demonstrate that a single factor, the compressibility of model weights, significantly heightens exfiltration risk for large language models (LLMs). We tailor compression specifically for exfiltration by relaxing decompression constraints and demonstrate that attackers could achieve 16x to 100x compression with minimal trade-offs, reducing the time it would take for an attacker to illicitly transmit model weights from the defender's server from months to days. Finally, we study defenses designed to reduce exfiltration risk in three distinct ways: making models harder to compress, making them harder to 'find,' and tracking provenance for post-attack analysis using forensic watermarks. While all defenses are promising, the forensic watermark defense is both effective and cheap, and therefore is a particularly attractive lever for mitigating weight-exfiltration risk.

Key Contributions

Demonstrates that tailored aggressive compression (relaxing decompression constraints) achieves 16x–100x compression of LLM weights with minimal quality trade-offs, reducing exfiltration time from months to days
Introduces an exfiltration-optimized compression methodology that treats decompression cost as irrelevant, unlike standard compression use cases
Evaluates three defensive countermeasures — compression resistance, model obfuscation, and forensic watermarking — finding forensic watermarks to be the most cost-effective mitigation

🛡️ Threat Analysis

Model Theft

The paper directly addresses stealing LLM model weights (the model itself, not outputs) through exfiltration attacks, and studies defenses including forensic watermarking to prove/track model IP ownership post-theft — core model theft threat with model-level watermarking as defense.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

digital

Applications

large language model weight protectionfrontier ai model securitydatacenter egress security

Read PDF arXiv DOI

Aggressive Compression Enables LLM Weight Theft

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Clone What You Can't Steal: Black-Box LLM Replication via Logit Leakage and Distillation

Black-Box Guardrail Reverse-engineering Attack

How Vulnerable Are Edge LLMs?

Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models

Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise

How to Steal Reasoning Without Reasoning Traces

Functional Subspace Watermarking for Large Language Models

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors