Mantas Mazeika

h-index: 26 19,356 citations 51 papers (total)

Papers in Database (2)

attack arXiv Jan 3, 2026 · Jan 2026

Aggressive Compression Enables LLM Weight Theft

Davis Brown, Juan-Pablo Rivera, Dan Hendrycks et al. · University of Pennsylvania · Georgia Institute of Technology +1 more

Aggressive compression of LLM weights reduces datacenter exfiltration time from months to days, enabling practical weight theft attacks

Model Theft Model Theft nlp
PDF
benchmark arXiv Oct 31, 2025 · Oct 2025

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Boyi Wei, Zora Che, Nathaniel Li et al. · Scale AI · Princeton University +3 more

Benchmark framework reveals bio-foundation model safety filtering is bypassable via fine-tuning, with dual-use signals persisting in pretrained representations

Transfer Learning Attack generative
PDF