attack 2026

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Gabriel Lee Jun Rong 1, Christos Korgialas 2, Dion Jia Xu Ho 3, Pai Chet Ng 4, Xiaoxiao Miao , Konstantinos N. Plataniotis 5

0 citations · 36 references · arXiv

α

Published on arXiv

2601.18386

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

ARMOR achieves improved cross-architecture transfer and reliably evades both white-box and black-box deepfake detectors by blending CW, JSMA, and STA perturbations guided by semantic VLM analysis

ARMOR

Novel technique introduced


Existing automated attack suites operate as static ensembles with fixed sequences, lacking strategic adaptation and semantic awareness. This paper introduces the Agentic Reasoning for Methods Orchestration and Reparameterization (ARMOR) framework to address these limitations. ARMOR orchestrates three canonical adversarial primitives, Carlini-Wagner (CW), Jacobian-based Saliency Map Attack (JSMA), and Spatially Transformed Attacks (STA) via Vision Language Models (VLM)-guided agents that collaboratively generate and synthesize perturbations through a shared ``Mixing Desk". Large Language Models (LLMs) adaptively tune and reparameterize parallel attack agents in a real-time, closed-loop system that exploits image-specific semantic vulnerabilities. On standard benchmarks, ARMOR achieves improved cross-architecture transfer and reliably fools both settings, delivering a blended output for blind targets and selecting the best attack or blended attacks for white-box targets using a confidence-and-SSIM score.


Key Contributions

  • ARMOR multi-agent framework using VLMs (Qwen2.5-VL) for image-semantic analysis and LLMs (Qwen3-32B) as Advisor agents to adaptively reparameterize CW, JSMA, and STA attacks in closed-loop
  • Shared 'Mixing Desk' that blends heterogeneous perturbation geometries (dense, sparse, geometric) optimized via randomized hill climbing on a confidence-and-SSIM score
  • Demonstrated improved cross-architecture black-box transfer against ViT-based deepfake detectors under a common l_inf budget compared to static ensemble baselines like AutoAttack

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is crafting adversarial examples (via CW, JSMA, STA) that cause misclassification in deepfake detection models at inference time; the VLM/LLM agents are attack orchestration tools, not targets.


Details

Domains
visionmultimodalnlp
Model Types
cnntransformervlmllm
Threat Tags
white_boxblack_boxinference_timedigital
Datasets
FaceForensics++
Applications
deepfake detectionimage classification