Breaking SafetyCore: Exploring the Risks of On-Device AI Deployment

Due to hardware and software improvements, an increasing number of AI models are deployed on-device. This shift enhances privacy and reduces latency, but also introduces security risks distinct from traditional software. In this article, we examine these risks through the real-world case study of SafetyCore, an Android system service incorporating sensitive image content detection. We demonstrate how the on-device AI model can be extracted and manipulated to bypass detection, effectively rendering the protection ineffective. Our analysis exposes vulnerabilities of on-device AI models and provides a practical demonstration of how adversaries can exploit them.

Key Contributions

First practical demonstration of model extraction from Google's SafetyCore on-device Android AI system via reverse engineering of the APK
End-to-end attack pipeline: extract on-device model, convert it to a white-box target, then generate adversarial images to evade sensitive content detection
Systematic analysis of why on-device AI deployment introduces security risks distinct from traditional software protections

🛡️ Threat Analysis

Input Manipulation Attack

After extracting the model (transitioning from black-box to white-box), the paper crafts adversarial images that cause misclassification at inference time, bypassing nudity/sensitive content detection — a textbook input manipulation attack explicitly listed in the keywords as 'Adversarial examples'.

Model Theft

The paper demonstrates reverse-engineering and extracting the AI model embedded in the SafetyCore Android system service APK, giving an adversary direct access to the model's architecture and weights — a clear model theft via extraction.

Details

Domains

vision

Model Types

cnn

Threat Tags

white_boxblack_boxinference_timetargeteddigital

Applications

2025 0 cit.

Input Manipulation Attack

69%