attack 2025

MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

Dion Jia Xu Ho 1,2, Gabriel Lee Jun Rong 2,3, Niharika Shrivastava 2, Harshavardhan Abichandani , Pai Chet Ng 2, Xiaoxiao Miao 3

2 citations · 46 references · arXiv

α

Published on arXiv

2510.12468

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

MS-GAGA achieves up to 27% higher misclassification rates on unseen black-box deepfake detectors compared to state-of-the-art adversarial attacks.

MS-GAGA

Novel technique introduced


We present MS-GAGA (Metric-Selective Guided Adversarial Generation Attack), a two-stage framework for crafting transferable and visually imperceptible adversarial examples against deepfake detectors in black-box settings. In Stage 1, a dual-stream attack module generates adversarial candidates: MNTD-PGD applies enhanced gradient calculations optimized for small perturbation budgets, while SG-PGD focuses perturbations on visually salient regions. This complementary design expands the adversarial search space and improves transferability across unseen models. In Stage 2, a metric-aware selection module evaluates candidates based on both their success against black-box models and their structural similarity (SSIM) to the original image. By jointly optimizing transferability and imperceptibility, MS-GAGA achieves up to 27% higher misclassification rates on unseen detectors compared to state-of-the-art attacks.


Key Contributions

  • Dual-stream attack module combining MNTD-PGD (adaptive gradient optimization for small perturbation budgets) and SG-PGD (saliency-guided perturbations) to expand the adversarial search space and improve cross-model transferability
  • Metric-aware selection module that jointly ranks adversarial candidates by black-box misclassification success and SSIM perceptual similarity for high-fidelity, transferable output
  • Achieves up to 27% higher misclassification rates on unseen black-box deepfake detectors compared to state-of-the-art attacks

🛡️ Threat Analysis

Input Manipulation Attack

MS-GAGA proposes gradient-based adversarial perturbations (MNTD-PGD and SG-PGD variants) that cause deepfake detectors to misclassify synthetic images as real at inference time — a classic input manipulation/evasion attack. The novel contributions are the dual-stream attack strategy and metric-aware selection module, both aimed at maximizing black-box transferability and imperceptibility.


Details

Domains
vision
Model Types
cnntransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
FaceForensics++
Applications
deepfake detectionfacial forgery detection