ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks
Gabriel Lee Jun Rong 1, Christos Korgialas 2, Dion Jia Xu Ho 3, Pai Chet Ng 4, Xiaoxiao Miao , Konstantinos N. Plataniotis 5
Published on arXiv
2601.18386
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
ARMOR achieves improved cross-architecture transfer and reliably evades both white-box and black-box deepfake detectors by blending CW, JSMA, and STA perturbations guided by semantic VLM analysis
ARMOR
Novel technique introduced
Existing automated attack suites operate as static ensembles with fixed sequences, lacking strategic adaptation and semantic awareness. This paper introduces the Agentic Reasoning for Methods Orchestration and Reparameterization (ARMOR) framework to address these limitations. ARMOR orchestrates three canonical adversarial primitives, Carlini-Wagner (CW), Jacobian-based Saliency Map Attack (JSMA), and Spatially Transformed Attacks (STA) via Vision Language Models (VLM)-guided agents that collaboratively generate and synthesize perturbations through a shared ``Mixing Desk". Large Language Models (LLMs) adaptively tune and reparameterize parallel attack agents in a real-time, closed-loop system that exploits image-specific semantic vulnerabilities. On standard benchmarks, ARMOR achieves improved cross-architecture transfer and reliably fools both settings, delivering a blended output for blind targets and selecting the best attack or blended attacks for white-box targets using a confidence-and-SSIM score.
Key Contributions
- ARMOR multi-agent framework using VLMs (Qwen2.5-VL) for image-semantic analysis and LLMs (Qwen3-32B) as Advisor agents to adaptively reparameterize CW, JSMA, and STA attacks in closed-loop
- Shared 'Mixing Desk' that blends heterogeneous perturbation geometries (dense, sparse, geometric) optimized via randomized hill climbing on a confidence-and-SSIM score
- Demonstrated improved cross-architecture black-box transfer against ViT-based deepfake detectors under a common l_inf budget compared to static ensemble baselines like AutoAttack
🛡️ Threat Analysis
Core contribution is crafting adversarial examples (via CW, JSMA, STA) that cause misclassification in deepfake detection models at inference time; the VLM/LLM agents are attack orchestration tools, not targets.