attack 2025

Breaking SafetyCore: Exploring the Risks of On-Device AI Deployment

Victor Guyomard , Mathis Mauvisseau , Marie Paindavoine

0 citations

α

Published on arXiv

2509.06371

Model Theft

OWASP ML Top 10 — ML05

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Successfully extracts the SafetyCore on-device model and generates adversarial images that bypass nudity/sensitive content detection, rendering the Google Messages content moderation protection ineffective


Due to hardware and software improvements, an increasing number of AI models are deployed on-device. This shift enhances privacy and reduces latency, but also introduces security risks distinct from traditional software. In this article, we examine these risks through the real-world case study of SafetyCore, an Android system service incorporating sensitive image content detection. We demonstrate how the on-device AI model can be extracted and manipulated to bypass detection, effectively rendering the protection ineffective. Our analysis exposes vulnerabilities of on-device AI models and provides a practical demonstration of how adversaries can exploit them.


Key Contributions

  • First practical demonstration of model extraction from Google's SafetyCore on-device Android AI system via reverse engineering of the APK
  • End-to-end attack pipeline: extract on-device model, convert it to a white-box target, then generate adversarial images to evade sensitive content detection
  • Systematic analysis of why on-device AI deployment introduces security risks distinct from traditional software protections

🛡️ Threat Analysis

Input Manipulation Attack

After extracting the model (transitioning from black-box to white-box), the paper crafts adversarial images that cause misclassification at inference time, bypassing nudity/sensitive content detection — a textbook input manipulation attack explicitly listed in the keywords as 'Adversarial examples'.

Model Theft

The paper demonstrates reverse-engineering and extracting the AI model embedded in the SafetyCore Android system service APK, giving an adversary direct access to the model's architecture and weights — a clear model theft via extraction.


Details

Domains
vision
Model Types
cnn
Threat Tags
white_boxblack_boxinference_timetargeteddigital
Applications
sensitive content detectioncontent moderationon-device ai