Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.

Key Contributions

Demonstrates novel attack vectors abusing hidden TensorFlow API functionalities (file read/write, network send/receive, persistence APIs) to embed RCE and data exfiltration payloads into model files distributed via model hubs
Shows that existing syntactic scanners on HuggingFace and TensorFlow Hub fail to detect these stealthy API abuse attacks due to lack of semantic understanding
Proposes an LLM-based semantic analysis approach to identify potentially abusable DL API functionalities and build more effective model hub scanners

🛡️ Threat Analysis

AI Supply Chain Attacks

The paper's core contribution is demonstrating attacks via trojaned models distributed on model hubs (HuggingFace, TensorFlow Hub) — a textbook ML supply chain attack. Attackers abuse DL framework APIs (TF file I/O, network send/receive, persistence APIs) to embed malware that executes at model load/inference time, exactly matching 'Trojaned pre-trained models on model hubs.' The detection tool addresses the supply chain vulnerability directly.