MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

We present MS-GAGA (Metric-Selective Guided Adversarial Generation Attack), a two-stage framework for crafting transferable and visually imperceptible adversarial examples against deepfake detectors in black-box settings. In Stage 1, a dual-stream attack module generates adversarial candidates: MNTD-PGD applies enhanced gradient calculations optimized for small perturbation budgets, while SG-PGD focuses perturbations on visually salient regions. This complementary design expands the adversarial search space and improves transferability across unseen models. In Stage 2, a metric-aware selection module evaluates candidates based on both their success against black-box models and their structural similarity (SSIM) to the original image. By jointly optimizing transferability and imperceptibility, MS-GAGA achieves up to 27% higher misclassification rates on unseen detectors compared to state-of-the-art attacks.

Key Contributions

Dual-stream attack module combining MNTD-PGD (adaptive gradient optimization for small perturbation budgets) and SG-PGD (saliency-guided perturbations) to expand the adversarial search space and improve cross-model transferability
Metric-aware selection module that jointly ranks adversarial candidates by black-box misclassification success and SSIM perceptual similarity for high-fidelity, transferable output
Achieves up to 27% higher misclassification rates on unseen black-box deepfake detectors compared to state-of-the-art attacks

🛡️ Threat Analysis

Input Manipulation Attack

MS-GAGA proposes gradient-based adversarial perturbations (MNTD-PGD and SG-PGD variants) that cause deepfake detectors to misclassify synthetic images as real at inference time — a classic input manipulation/evasion attack. The novel contributions are the dual-stream attack strategy and metric-aware selection module, both aimed at maximizing black-box transferability and imperceptibility.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

black_boxinference_timetargeteddigital

Datasets

FaceForensics++

Applications

2025 0 cit.

Input Manipulation Attack

92%