attack 2026

Aggressive Compression Enables LLM Weight Theft

Davis Brown 1, Juan-Pablo Rivera 2, Dan Hendrycks 3, Mantas Mazeika 3

0 citations · 41 references · arXiv

α

Published on arXiv

2601.01296

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

Aggressive compression reduces LLM weight exfiltration time from months to days (16x–100x compression ratio), and forensic watermarking is identified as the most practical and effective post-exfiltration defense.


As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltration attacks where an adversary attempts to sneak model weights out of a datacenter over a network. While exfiltration attacks are multi-step cyber attacks, we demonstrate that a single factor, the compressibility of model weights, significantly heightens exfiltration risk for large language models (LLMs). We tailor compression specifically for exfiltration by relaxing decompression constraints and demonstrate that attackers could achieve 16x to 100x compression with minimal trade-offs, reducing the time it would take for an attacker to illicitly transmit model weights from the defender's server from months to days. Finally, we study defenses designed to reduce exfiltration risk in three distinct ways: making models harder to compress, making them harder to 'find,' and tracking provenance for post-attack analysis using forensic watermarks. While all defenses are promising, the forensic watermark defense is both effective and cheap, and therefore is a particularly attractive lever for mitigating weight-exfiltration risk.


Key Contributions

  • Demonstrates that tailored aggressive compression (relaxing decompression constraints) achieves 16x–100x compression of LLM weights with minimal quality trade-offs, reducing exfiltration time from months to days
  • Introduces an exfiltration-optimized compression methodology that treats decompression cost as irrelevant, unlike standard compression use cases
  • Evaluates three defensive countermeasures — compression resistance, model obfuscation, and forensic watermarking — finding forensic watermarks to be the most cost-effective mitigation

🛡️ Threat Analysis

Model Theft

The paper directly addresses stealing LLM model weights (the model itself, not outputs) through exfiltration attacks, and studies defenses including forensic watermarking to prove/track model IP ownership post-theft — core model theft threat with model-level watermarking as defense.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
digital
Applications
large language model weight protectionfrontier ai model securitydatacenter egress security