Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
Songping Wang 1, Qinglong Liu 1, Yueming Lyu 1, Ningyuan Li 2, Ziwen He 3, Caifeng Shan 1
Published on arXiv
2602.01369
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Joint adversarial training (J-TLAT) consistently enhances adversarial robustness across diverse datasets and architectures while reducing inference cost by more than 60% compared to dense models.
J-TLGA / J-TLAT
Novel technique introduced
Mixture-of-Experts (MoE) has demonstrated strong performance in video understanding tasks, yet its adversarial robustness remains underexplored. Existing attack methods often treat MoE as a unified architecture, overlooking the independent and collaborative weaknesses of key components such as routers and expert modules. To fill this gap, we propose Temporal Lipschitz-Guided Attacks (TLGA) to thoroughly investigate component-level vulnerabilities in video MoE models. We first design attacks on the router, revealing its independent weaknesses. Building on this, we introduce Joint Temporal Lipschitz-Guided Attacks (J-TLGA), which collaboratively perturb both routers and experts. This joint attack significantly amplifies adversarial effects and exposes the Achilles' Heel (collaborative weaknesses) of the MoE architecture. Based on these insights, we further propose Joint Temporal Lipschitz Adversarial Training (J-TLAT). J-TLAT performs joint training to further defend against collaborative weaknesses, enhancing component-wise robustness. Our framework is plug-and-play and reduces inference cost by more than 60% compared with dense models. It consistently enhances adversarial robustness across diverse datasets and architectures, effectively mitigating both the independent and collaborative weaknesses of MoE.
Key Contributions
- TLGA/J-TLGA: first component-level adversarial attacks targeting routers and experts in video MoE, using Temporal Lipschitz-guided perturbations to reveal collaborative weaknesses
- J-TLAT: joint adversarial training defense that hierarchically hardens router and expert components, mitigating both independent and collaborative vulnerabilities
- Plug-and-play framework that reduces inference cost by over 60% versus dense models while consistently improving adversarial robustness across architectures and datasets
🛡️ Threat Analysis
TLGA and J-TLGA are gradient-based adversarial perturbation attacks causing misclassification at inference time; J-TLAT is adversarial training proposed as a defense — both the attack and defense map directly to ML01 (input manipulation and adversarial robustness).