ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Existing automated attack suites operate as static ensembles with fixed sequences, lacking strategic adaptation and semantic awareness. This paper introduces the Agentic Reasoning for Methods Orchestration and Reparameterization (ARMOR) framework to address these limitations. ARMOR orchestrates three canonical adversarial primitives, Carlini-Wagner (CW), Jacobian-based Saliency Map Attack (JSMA), and Spatially Transformed Attacks (STA) via Vision Language Models (VLM)-guided agents that collaboratively generate and synthesize perturbations through a shared ``Mixing Desk". Large Language Models (LLMs) adaptively tune and reparameterize parallel attack agents in a real-time, closed-loop system that exploits image-specific semantic vulnerabilities. On standard benchmarks, ARMOR achieves improved cross-architecture transfer and reliably fools both settings, delivering a blended output for blind targets and selecting the best attack or blended attacks for white-box targets using a confidence-and-SSIM score.

Key Contributions

ARMOR multi-agent framework using VLMs (Qwen2.5-VL) for image-semantic analysis and LLMs (Qwen3-32B) as Advisor agents to adaptively reparameterize CW, JSMA, and STA attacks in closed-loop
Shared 'Mixing Desk' that blends heterogeneous perturbation geometries (dense, sparse, geometric) optimized via randomized hill climbing on a confidence-and-SSIM score
Demonstrated improved cross-architecture black-box transfer against ViT-based deepfake detectors under a common l_inf budget compared to static ensemble baselines like AutoAttack

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is crafting adversarial examples (via CW, JSMA, STA) that cause misclassification in deepfake detection models at inference time; the VLM/LLM agents are attack orchestration tools, not targets.

Details

Domains

visionmultimodalnlp

Model Types

cnntransformervlmllm

Threat Tags

white_boxblack_boxinference_timedigital

Datasets

FaceForensics++

Applications

2025 6 cit.

Input Manipulation Attack

74%