Stealing AI Model Weights Through Covert Communication Channels
Valentin Barbaza , Alan Rodrigo Diaz-Rizo , Hassan Aboushady , Spyridon Raptis , Haralampos-G. Stratigopoulos
Published on arXiv
2510.00151
Model Theft
OWASP ML Top 10 — ML05
Key Finding
Successfully reconstructed complete weight matrices of four diverse AI models with preserved functional accuracy, validating practical model theft via hardware Trojan covert channel on real hardware
AI models are often regarded as valuable intellectual property due to the high cost of their development, the competitive advantage they provide, and the proprietary techniques involved in their creation. As a result, AI model stealing attacks pose a serious concern for AI model providers. In this work, we present a novel attack targeting wireless devices equipped with AI hardware accelerators. The attack unfolds in two phases. In the first phase, the victim's device is compromised with a hardware Trojan (HT) designed to covertly leak model weights through a hidden communication channel, without the victim realizing it. In the second phase, the adversary uses a nearby wireless device to intercept the victim's transmission frames during normal operation and incrementally reconstruct the complete weight matrix. The proposed attack is agnostic to both the AI model architecture and the hardware accelerator used. We validate our approach through a hardware-based demonstration involving four diverse AI models of varying types and sizes. We detail the design of the HT and the covert channel, highlighting their stealthy nature. Additionally, we analyze the impact of bit error rates on the reception and propose an error mitigation technique. The effectiveness of the attack is evaluated based on the accuracy of the reconstructed models with stolen weights and the time required to extract them. Finally, we explore potential defense mechanisms.
Key Contributions
- Two-phase attack combining hardware Trojan implantation with wireless frame interception to incrementally reconstruct complete AI model weight matrices
- Architecture-agnostic covert channel design that embeds model weights into normal wireless transmission frames without alerting the victim
- Bit error rate analysis and error mitigation technique for reliable weight reconstruction over a wireless covert channel
🛡️ Threat Analysis
Primary contribution is stealing AI model weights (intellectual property) using a hardware Trojan as a covert exfiltration mechanism — this is model theft via a hardware-based side/covert channel, matching the 'side-channel attacks to extract model parameters' criterion under ML05.