Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection
Ariana Yi 1, Ce Zhou 2, Liyang Xiao 3, Qiben Yan 3
Published on arXiv
2510.19574
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
α-Cloak achieves 100% attack success rate across all tested models (5 object detectors, 1 VLM, Gemini-2.0-Flash) with no perceptible artifacts, exploiting alpha channel rendering discrepancies between human viewers and ML models.
α-Cloak
Novel technique introduced
As object detection models are increasingly deployed in cyber-physical systems such as autonomous vehicles (AVs) and surveillance platforms, ensuring their security against adversarial threats is essential. While prior work has explored adversarial attacks in the image domain, those attacks in the video domain remain largely unexamined, especially in the no-box setting. In this paper, we present α-Cloak, the first no-box adversarial attack on object detectors that operates entirely through the alpha channel of RGBA videos. α-Cloak exploits the alpha channel to fuse a malicious target video with a benign video, resulting in a fused video that appears innocuous to human viewers but consistently fools object detectors. Our attack requires no access to model architecture, parameters, or outputs, and introduces no perceptible artifacts. We systematically study the support for alpha channels across common video formats and playback applications, and design a fusion algorithm that ensures visual stealth and compatibility. We evaluate α-Cloak on five state-of-the-art object detectors, a vision-language model, and a multi-modal large language model (Gemini-2.0-Flash), demonstrating a 100% attack success rate across all scenarios. Our findings reveal a previously unexplored vulnerability in video-based perception systems, highlighting the urgent need for defenses that account for the alpha channel in adversarial settings.
Key Contributions
- First no-box adversarial attack on video object detectors operating entirely through alpha channel manipulation in RGBA videos, requiring zero model access
- Systematic cross-platform study of alpha channel support across video formats and playback applications, with a stealth-preserving fusion algorithm
- 100% attack success rate demonstrated on 5 SOTA object detectors, a VLM, and Gemini-2.0-Flash with no perceptible artifacts
🛡️ Threat Analysis
α-Cloak crafts adversarial video inputs via alpha channel fusion that consistently cause object detectors to fail at inference time, requiring zero access to model architecture, parameters, or outputs — a no-box evasion attack achieving 100% success across 5 SOTA object detectors.