Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts

Mixture-of-Experts (MoE) has demonstrated strong performance in video understanding tasks, yet its adversarial robustness remains underexplored. Existing attack methods often treat MoE as a unified architecture, overlooking the independent and collaborative weaknesses of key components such as routers and expert modules. To fill this gap, we propose Temporal Lipschitz-Guided Attacks (TLGA) to thoroughly investigate component-level vulnerabilities in video MoE models. We first design attacks on the router, revealing its independent weaknesses. Building on this, we introduce Joint Temporal Lipschitz-Guided Attacks (J-TLGA), which collaboratively perturb both routers and experts. This joint attack significantly amplifies adversarial effects and exposes the Achilles' Heel (collaborative weaknesses) of the MoE architecture. Based on these insights, we further propose Joint Temporal Lipschitz Adversarial Training (J-TLAT). J-TLAT performs joint training to further defend against collaborative weaknesses, enhancing component-wise robustness. Our framework is plug-and-play and reduces inference cost by more than 60% compared with dense models. It consistently enhances adversarial robustness across diverse datasets and architectures, effectively mitigating both the independent and collaborative weaknesses of MoE.

Key Contributions

TLGA/J-TLGA: first component-level adversarial attacks targeting routers and experts in video MoE, using Temporal Lipschitz-guided perturbations to reveal collaborative weaknesses
J-TLAT: joint adversarial training defense that hierarchically hardens router and expert components, mitigating both independent and collaborative vulnerabilities
Plug-and-play framework that reduces inference cost by over 60% versus dense models while consistently improving adversarial robustness across architectures and datasets

🛡️ Threat Analysis

Input Manipulation Attack

TLGA and J-TLGA are gradient-based adversarial perturbation attacks causing misclassification at inference time; J-TLAT is adversarial training proposed as a defense — both the attack and defense map directly to ML01 (input manipulation and adversarial robustness).

Details

Domains

vision

Model Types

transformer

Threat Tags

white_boxinference_timeuntargeteddigital

Applications

2025 0 cit.

Input Manipulation Attack

92%