defense 2026

Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts

Songping Wang 1, Qinglong Liu 1, Yueming Lyu 1, Ningyuan Li 2, Ziwen He 3, Caifeng Shan 1

1 citations · 65 references · arXiv (Cornell University)

α

Published on arXiv

2602.01369

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Joint adversarial training (J-TLAT) consistently enhances adversarial robustness across diverse datasets and architectures while reducing inference cost by more than 60% compared to dense models.

J-TLGA / J-TLAT

Novel technique introduced


Mixture-of-Experts (MoE) has demonstrated strong performance in video understanding tasks, yet its adversarial robustness remains underexplored. Existing attack methods often treat MoE as a unified architecture, overlooking the independent and collaborative weaknesses of key components such as routers and expert modules. To fill this gap, we propose Temporal Lipschitz-Guided Attacks (TLGA) to thoroughly investigate component-level vulnerabilities in video MoE models. We first design attacks on the router, revealing its independent weaknesses. Building on this, we introduce Joint Temporal Lipschitz-Guided Attacks (J-TLGA), which collaboratively perturb both routers and experts. This joint attack significantly amplifies adversarial effects and exposes the Achilles' Heel (collaborative weaknesses) of the MoE architecture. Based on these insights, we further propose Joint Temporal Lipschitz Adversarial Training (J-TLAT). J-TLAT performs joint training to further defend against collaborative weaknesses, enhancing component-wise robustness. Our framework is plug-and-play and reduces inference cost by more than 60% compared with dense models. It consistently enhances adversarial robustness across diverse datasets and architectures, effectively mitigating both the independent and collaborative weaknesses of MoE.


Key Contributions

  • TLGA/J-TLGA: first component-level adversarial attacks targeting routers and experts in video MoE, using Temporal Lipschitz-guided perturbations to reveal collaborative weaknesses
  • J-TLAT: joint adversarial training defense that hierarchically hardens router and expert components, mitigating both independent and collaborative vulnerabilities
  • Plug-and-play framework that reduces inference cost by over 60% versus dense models while consistently improving adversarial robustness across architectures and datasets

🛡️ Threat Analysis

Input Manipulation Attack

TLGA and J-TLGA are gradient-based adversarial perturbation attacks causing misclassification at inference time; J-TLAT is adversarial training proposed as a defense — both the attack and defense map directly to ML01 (input manipulation and adversarial robustness).


Details

Domains
vision
Model Types
transformer
Threat Tags
white_boxinference_timeuntargeteddigital
Applications
video understandingaction recognitionvideo-language modeling