MS-GAGA: Metric-Selective Guided Adversarial Generation Attack
Dion Jia Xu Ho 1,2, Gabriel Lee Jun Rong 2,3, Niharika Shrivastava 2, Harshavardhan Abichandani , Pai Chet Ng 2, Xiaoxiao Miao 3
Published on arXiv
2510.12468
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
MS-GAGA achieves up to 27% higher misclassification rates on unseen black-box deepfake detectors compared to state-of-the-art adversarial attacks.
MS-GAGA
Novel technique introduced
We present MS-GAGA (Metric-Selective Guided Adversarial Generation Attack), a two-stage framework for crafting transferable and visually imperceptible adversarial examples against deepfake detectors in black-box settings. In Stage 1, a dual-stream attack module generates adversarial candidates: MNTD-PGD applies enhanced gradient calculations optimized for small perturbation budgets, while SG-PGD focuses perturbations on visually salient regions. This complementary design expands the adversarial search space and improves transferability across unseen models. In Stage 2, a metric-aware selection module evaluates candidates based on both their success against black-box models and their structural similarity (SSIM) to the original image. By jointly optimizing transferability and imperceptibility, MS-GAGA achieves up to 27% higher misclassification rates on unseen detectors compared to state-of-the-art attacks.
Key Contributions
- Dual-stream attack module combining MNTD-PGD (adaptive gradient optimization for small perturbation budgets) and SG-PGD (saliency-guided perturbations) to expand the adversarial search space and improve cross-model transferability
- Metric-aware selection module that jointly ranks adversarial candidates by black-box misclassification success and SSIM perceptual similarity for high-fidelity, transferable output
- Achieves up to 27% higher misclassification rates on unseen black-box deepfake detectors compared to state-of-the-art attacks
🛡️ Threat Analysis
MS-GAGA proposes gradient-based adversarial perturbations (MNTD-PGD and SG-PGD variants) that cause deepfake detectors to misclassify synthetic images as real at inference time — a classic input manipulation/evasion attack. The novel contributions are the dual-stream attack strategy and metric-aware selection module, both aimed at maximizing black-box transferability and imperceptibility.